Session

Clean, Correct Change Data Capture with Spark Declarative Pipelines

Overview

Experience	In Person
Track	Data Engineering & Streaming
Industry	Enterprise Technology, Consulting & Services
Technologies	Lakeflow
Skill Level	Intermediate

Learn how to construct clean, correct, cost-conscious CDC pipelines that conquer chaotic change, column creep and complicated SCD calculations. Today, implementing CDC typically means writing complex, expensive merge logic. Handling straightforward cases is manageable — but complexity escalates quickly when you introduce out-of-order data, duplicates, schema evolution and slowly changing dimensions. What works in development often breaks under real production conditions.

Spark Declarative Pipelines (SDP) addresses this with Auto CDC, a declarative API that abstracts away merge orchestration and correctness guarantees. Instead of implementing imperative patterns, you declare intent — such as applying SCD Type 2 — and SDP executes it with built-in consistency, incremental processing and resilience.

In this session, we’ll walk through the Auto CDC API, examine how it automates complex merge operations and show how declarative CDC improves reliability across batch and streaming workloads.

Session Speakers

Andreas Neumann

/Senior Staff Software Engineer
Databricks

IMAGE COMING SOON

Joseph Torres

/Staff Software Engineer
Databricks