From Repetition to Reuse: The Evolution of Apache Spark™ Declarative Pipelines
Overview
| Experience | In Person |
|---|---|
| Track | Data Engineering & Streaming |
| Industry | Enterprise Technology, Consulting & Services |
| Technologies | Lakeflow |
| Skill Level | Beginner |
Learn how to build batch and streaming pipelines faster while improving correctness and reducing operational complexity with Apache Spark™ Declarative Pipelines.Production Spark pipelines often require extensive orchestration code for dependency management, checkpointing, retries, and execution ordering—surrounding a relatively small amount of transformation logic. As pipelines scale, this scaffolding becomes increasingly difficult to maintain and evolve.Introduced in Spark 4.1, Spark Declarative Pipelines (SDP) shifts this model by allowing developers to declare datasets and transformations while Spark constructs and manages the execution plan. By separating what a pipeline does from how it runs, SDP reduces boilerplate and accelerates time to production.We’ll examine the architectural foundations of declarative development in Spark and how SDP handles dependency resolution, parallelization, checkpoint coordination, and failure recovery. We’ll also cover incremental processing and emerging testing patterns for declarative pipelines.
Session Speakers
Lisa Cao
/Staff Developer Relations
Databricks
Andreas Neumann
/Databricks