HomepageData + AI Summit 2022 Logo
Watch on demand

Mapping Data Quality Concerns to Data Lake Zones

On Demand

Type

  • Session

Format

  • In-Person

Track

  • Data Security and Governance

Difficulty

  • Intermediate

Room

  • Moscone South | Upper Mezzanine | 152

Duration

  • 35 min
Download session slides

Überblick

A common pattern in Data Lake and Lakehouse design is structuring data into zones, with Bronze, Silver and Gold being typical labels. Each zone is suitable for different workloads and different consumers: for instance, machine learning algorithms typically process against Bronze or Silver, while analytic dashboards often query Gold. This prompts the question: which layer is best suited for applying data quality rules and actions? Our answer: all of them.

In this session, we’ll expand on our answer by describing the purposes of the different zones, and mapping the categories of data quality relevant for each by assessing its qualitative requirements. We’ll describe Data Enrichment: the practice of making observed anomalies available as inputs to downstream data pipelines, and provide recommendations for when to merely alert, when to quarantine data, when to halt pipelines, and when to apply automated corrective actions.

Session Speakers

Stewart Bryson

Co-founder & Chief Customer Officer

Qualytics

Das Beste des Data+AI Summits anzeigen

Watch on demand