Building a Production: Scale Dimensional Data Mart With Lakeflow Spark Declarative Pipelines and AUTO CDC
Overview
| Experience | In Person |
|---|---|
| Track | Data Engineering & Streaming |
| Industry | Retail & Consumer Goods |
| Technologies | Databricks SQL, Lakeflow, Unity Catalog |
| Skill Level | Intermediate |
At 84.51˚, we replaced legacy ETL processing in our retail sales data mart with Lakeflow Spark Declarative Pipelines to simplify development while improving reliability at scale. In this session, we’ll walk through how a low-code, declarative approach enables dimensional modeling for both slowly changing dimensions and high-volume fact data.You’ll see how to use AUTO CDC to handle data updates across batch and streaming pipelines, supporting continuously arriving sales transactions alongside rapidly changing product, store and customer attributes. Our ETL crunches approximately 8 million sales transactions and over 80 million purchased items per day in support of our 84.51˚ Stratum platform.Attendees will leave with practical guidance for designing and operating dimensional data marts using Spark Declarative Pipelines, including patterns for CDC, schema evolution, and hybrid batch/streaming workloads that can be applied immediately in production environments.
Session Speakers
Scott Gordon
/Lead Data Engineer
84.51˚
Shu Li
/Sr. Specialist Solutions Architect
Databricks