Session

Clean, Correct Change Data Capture with Spark Declarative Pipelines

Overview

ExperienceIn Person
TrackData Engineering & Streaming
IndustryEnterprise Technology, Consulting & Services
TechnologiesLakeflow
Skill LevelIntermediate

Learn how to construct clean, correct, cost-conscious CDC pipelines that conquer chaotic change, column creep and complicated SCD calculations. Today, implementing CDC typically means writing complex, expensive merge logic. Handling straightforward cases is manageable — but complexity escalates quickly when you introduce out-of-order data, duplicates, schema evolution and slowly changing dimensions. What works in development often breaks under real production conditions.

Spark Declarative Pipelines (SDP) addresses this with Auto CDC, a declarative API that abstracts away merge orchestration and correctness guarantees. Instead of implementing imperative patterns, you declare intent — such as applying SCD Type 2 — and SDP executes it with built-in consistency, incremental processing and resilience.

In this session, we’ll walk through the Auto CDC API, examine how it automates complex merge operations and show how declarative CDC improves reliability across batch and streaming workloads.

Session Speakers

Andreas Neumann

/Senior Staff Software Engineer
Databricks

Speaker placeholderIMAGE COMING SOON

Joseph Torres

/Staff Software Engineer
Databricks