Session

From Repetition to Reuse: The Evolution of Apache Spark™ Declarative Pipelines

Overview

ExperienceIn Person
TrackData Engineering & Streaming
IndustryEnterprise Technology, Consulting & Services
TechnologiesLakeflow
Skill LevelBeginner

Learn how to build batch and streaming pipelines faster while improving correctness and reducing operational complexity with Apache Spark™ Declarative Pipelines.Production Spark pipelines often require extensive orchestration code for dependency management, checkpointing, retries, and execution ordering—surrounding a relatively small amount of transformation logic. As pipelines scale, this scaffolding becomes increasingly difficult to maintain and evolve.Introduced in Spark 4.1, Spark Declarative Pipelines (SDP) shifts this model by allowing developers to declare datasets and transformations while Spark constructs and manages the execution plan. By separating what a pipeline does from how it runs, SDP reduces boilerplate and accelerates time to production.We’ll examine the architectural foundations of declarative development in Spark and how SDP handles dependency resolution, parallelization, checkpoint coordination, and failure recovery. We’ll also cover incremental processing and emerging testing patterns for declarative pipelines.

Session Speakers

Speaker placeholderIMAGE COMING SOON

Lisa Cao

/Staff Developer Relations
Databricks

Speaker placeholderIMAGE COMING SOON

Andreas Neumann

/Databricks