Session

From Repetition to Reuse: The Evolution of Apache Spark™ Declarative Pipelines

Overview

Experience	In Person
Track	Data Engineering & Streaming
Industry	Enterprise Technology, Consulting & Services
Technologies	Lakeflow
Skill Level	Beginner

Learn how to build batch and streaming pipelines faster while improving correctness and reducing operational complexity with Apache Spark™ Declarative Pipelines.

Production Spark pipelines often require extensive orchestration code for dependency management, checkpointing, retries, and execution ordering—surrounding a relatively small amount of transformation logic. As pipelines scale, this scaffolding becomes increasingly difficult to maintain and evolve.

Introduced in Spark 4.1, Spark Declarative Pipelines (SDP) shifts this model by allowing developers to declare datasets and transformations while Spark constructs and manages the execution plan. By separating what a pipeline does from how it runs, SDP reduces boilerplate and accelerates time to production.

We’ll examine the architectural foundations of declarative development in Spark and how SDP handles dependency resolution, parallelization, checkpoint coordination, and failure recovery. We’ll also cover incremental processing and emerging testing patterns for declarative pipelines.

Session Speakers

Andreas Neumann

/Senior Staff Software Engineer
Databricks

IMAGE COMING SOON

Lisa Cao

/Staff Developer Relations
Databricks