Clean, Correct Change Data Capture with Spark Declarative Pipelines
Overview
| Experience | In Person |
|---|---|
| Track | Data Engineering & Streaming |
| Industry | Enterprise Technology, Consulting & Services |
| Technologies | Lakeflow |
| Skill Level | Intermediate |
Learn how to construct clean, correct, cost-conscious CDC pipelines that conquer chaotic change, column creep and complicated SCD calculations. Today, implementing CDC typically means writing complex, expensive merge logic. Handling straightforward cases is manageable — but complexity escalates quickly when you introduce out-of-order data, duplicates, schema evolution and slowly changing dimensions. What works in development often breaks under real production conditions.
Spark Declarative Pipelines (SDP) addresses this with Auto CDC, a declarative API that abstracts away merge orchestration and correctness guarantees. Instead of implementing imperative patterns, you declare intent — such as applying SCD Type 2 — and SDP executes it with built-in consistency, incremental processing and resilience.
In this session, we’ll walk through the Auto CDC API, examine how it automates complex merge operations and show how declarative CDC improves reliability across batch and streaming workloads.
Session Speakers
Andreas Neumann
/Senior Staff Software Engineer
Databricks
Joseph Torres
/Staff Software Engineer
Databricks