Session

Building a Production: Scale Dimensional Data Mart With Lakeflow Spark Declarative Pipelines and AUTO CDC

Overview

Experience	In Person
Track	Data Engineering & Streaming
Industry	Retail & Consumer Goods
Technologies	Databricks SQL, Lakeflow, Unity Catalog
Skill Level	Intermediate

At 84.51˚, we replaced legacy ETL processing in our retail sales data mart with Lakeflow Spark Declarative Pipelines to simplify development while improving reliability at scale. In this session, we’ll walk through how a low-code, declarative approach enables dimensional modeling for both slowly changing dimensions and high-volume fact data.

You’ll see how to use AUTO CDC to handle data updates across batch and streaming pipelines, supporting continuously arriving sales transactions alongside rapidly changing product, store and customer attributes. Our ETL crunches approximately 8 million sales transactions and over 80 million purchased items per day in support of our 84.51˚ Stratum platform.

Attendees will leave with practical guidance for designing and operating dimensional data marts using Spark Declarative Pipelines, including patterns for CDC, schema evolution, and hybrid batch/streaming workloads that can be applied immediately in production environments.

Building a Production: Scale Dimensional Data Mart With Lakeflow Spark Declarative Pipelines and AUTO CDC

Overview

Session Speakers

Scott Gordon

Shu Li