HomepageData + AI Summit 2022 Logo
Watch on demand

How unsupervised machine learning can scale data quality monitoring in Databricks

On Demand

Type

  • Sponsored Session

Format

  • In-Person

Track

  • Sponsored Session

Difficulty

  • Intermediate

Room

  • Moscone South | Level 2 | 216

Duration

  • 35 min

Overview

Technologies like Databricks Delta Lake and Databricks SQL enable enterprises to store and query their data. But existing rules and metrics approaches to monitoring the quality of this data are tedious to set up and maintain, fail to catch unexpected issues, and generate false positive alerts that lead to alert fatigue.

In this talk, Jeremy will describe a set of fully unsupervised machine learning algorithms for monitoring data quality at scale in Databricks. He will cover how the algorithms work, their strengths and weaknesses, and how they are tested and calibrated.

Participants will leave this talk with an understanding of unsupervised data quality monitoring, its strengths and weaknesses, and how to begin monitoring data using it in Databricks.

Session Speakers

Jeremy Stanley

Co-Founder & CTO

Anomalo

See the best of Data+AI Summit

Watch on demand