Session

What’s New in Apache Spark™ 4.1?

Overview

ExperienceIn Person
TrackData Engineering & Streaming
IndustryEnterprise Technology
TechnologiesDatabricks SQL
Skill LevelIntermediate

Explore the exciting new features of Apache Spark 4.1 with major new capabilities for both data engineering and analytics workflows. We introduce Spark Declarative Pipelines (SDP) to let users define what their data pipelines should accomplish while Spark orchestrates execution, dependencies, parallelism and retries, an official Real-Time Mode in Structured Streaming for sub-second, continuous low-latency processing, and extensive improvements to the PySpark ecosystem with Arrow-native UDFs/UDTFs, Python Data Source filter pushdown, and better Python worker logging. The release also brings Spark Connect enhancements including GA support for Spark ML on the Python client and increased stability for large workloads, and expands SQL support with GA SQL Scripting, the VARIANT type with shredding optimizations, support for recursive CTEs, and new approximate data sketch functions. Spark 4.1 prioritizes higher-level abstractions, real-time performance, richer Python support, and more.

Session Speakers

Speaker placeholderIMAGE COMING SOON

Wenchen Fan

/Senior Staff Software Engineer
Databricks

Speaker placeholderIMAGE COMING SOON

Daniel Tenedorio

/Sr. Staff Software Engineer
Databricks