Mapping Data Quality Concerns to Data Lake Zones
- Data Security and Governance
- Moscone South | Upper Mezzanine | 152
- 35 min
A common pattern in Data Lake and Lakehouse design is structuring data into zones, with Bronze, Silver and Gold being typical labels. Each zone is suitable for different workloads and different consumers: for instance, machine learning algorithms typically process against Bronze or Silver, while analytic dashboards often query Gold. This prompts the question: which layer is best suited for applying data quality rules and actions? Our answer: all of them.
In this session, we’ll expand on our answer by describing the purposes of the different zones, and mapping the categories of data quality relevant for each by assessing its qualitative requirements. We’ll describe Data Enrichment: the practice of making observed anomalies available as inputs to downstream data pipelines, and provide recommendations for when to merely alert, when to quarantine data, when to halt pipelines, and when to apply automated corrective actions.