Session

What’s New in Apache Spark™ 4.1?

Overview

Experience	In Person
Track	Data Engineering & Streaming
Industry	Enterprise Technology
Technologies	Databricks SQL
Skill Level	Intermediate

Explore the exciting new features of Apache Spark 4.1 with major new capabilities for both data engineering and analytics workflows. We introduce Spark Declarative Pipelines (SDP) to let users define what their data pipelines should accomplish while Spark orchestrates execution, dependencies, parallelism and retries, an official Real-Time Mode in Structured Streaming for sub-second, continuous low-latency processing, and extensive improvements to the PySpark ecosystem with Arrow-native UDFs/UDTFs, Python Data Source filter pushdown, and better Python worker logging. The release also brings Spark Connect enhancements including GA support for Spark ML on the Python client and increased stability for large workloads, and expands SQL support with GA SQL Scripting, the VARIANT type with shredding optimizations, support for recursive CTEs, and new approximate data sketch functions. Spark 4.1 prioritizes higher-level abstractions, real-time performance, richer Python support, and more.

What’s New in Apache Spark™ 4.1?

Overview

Session Speakers

Daniel Tenedorio

Wenchen Fan