Scaling Data Quality at Zillow: Migrating and Enhancing Data Quality Systems on Databricks
Overview
Experience | In Person |
---|---|
Type | Lightning Talk |
Track | Data Engineering and Streaming |
Industry | Enterprise Technology |
Technologies | Apache Spark |
Skill Level | Intermediate |
Zillow has well-established, comprehensive systems for defining and enforcing data quality contracts and detecting anomalies.
In this session, we will share how we evaluated Databricks’ native data quality features and why we chose DLT expectations for DLT pipelines, along with a combination of enforced constraints and self-defined queries for other job types. Our evaluation considered factors such as performance overhead, cost and scalability. We’ll highlight key improvements over our previous system and demonstrate how these choices have enabled Zillow to enforce scalable, production-grade data quality.
Additionally, we are actively testing Databricks’ latest data quality innovations, including enhancements to lakehouse monitoring and the newly released DQX project from Databricks Labs.
In summary, we will cover Zillow’s approach to data quality in the lakehouse, key lessons from our migration and actionable takeaways.
Session Speakers
IMAGE COMING SOON
Laura Zhou
/Zillow