Session

Evolving Apache Spark Structured Streaming in Open Source: A Year in Review and the Road Ahead!

Overview

ExperienceIn Person
TrackData Engineering & Streaming
IndustryEnterprise Technology, Communications, Media & Entertainment, Travel & Hospitality
TechnologiesLakeflow
Skill LevelIntermediate

Apache Spark Structured Streaming has quickly become the foundational open-source technology for powering mission-critical streaming ingestion and ETL pipelines worldwide. The Spark 4.1 release introduces powerful updates, most notably a new Real-Time Mode that enables operational pipelines to process data with latencies in milliseconds, all within the Spark ecosystem. Stateful transformations have been significantly bolstered through TransformWithState and stream-stream join enhancements, offering broader coverage and higher performance. Furthermore, robust improvements—such as revamped locking and integrity verification—provide improved stability for stateful queries. Looking ahead to Spark 4.2, the roadmap includes state repartitioning, query evolution, hybrid sources, and other enhancements, further establishing Spark as the gold standard for all streaming workloads. Join us to explore these recent advancements and the exciting future of open-source Structured Streaming!

Session Speakers

Speaker placeholderIMAGE COMING SOON

Jerry Peng

/Staff Software Engineer
Databricks

Speaker placeholderIMAGE COMING SOON

Anish Shrigondekar

/Software Engineer
Databricks