Session
What's New in Apache Spark™ 4.1
Overview
| Experience | In Person |
|---|---|
| Track | Data Engineering & Streaming |
| Industry | Healthcare & Life Sciences, Manufacturing, Financial Services |
| Technologies | Lakeflow |
| Skill Level | Intermediate |
Apache Spark 4.1 continues the evolution of the world’s leading open-source data engine, accelerating developer productivity and making modern data engineering patterns first-class citizens. Spark 4.1 improvements reduce friction, improve performance, and expand what Spark can express natively.In this session you’ll see how Spark 4.1:
- Declarative APIs that let you define outcomes while Spark manages pipeline execution, dependencies, and failure handling
- Real-Time Mode for streaming, enabling continuous, low-latency processing
- Improved Python performance and ergonomics, including Arrow-native UDFs/UDTFs and better debugging
- Expanded SQL expressiveness, with GA SQL Scripting, VARIANT support for semi-structured data, recursive CTEs, and new functions
- Mature Spark Connect, including GA Spark ML support for Python clients and improved stability at scale
This session is a must for teams building and operating data pipelines, analytics, and streaming workloads on Spark at scale.
Session Speakers
Daniel Tenedorio
/Sr. Staff Software Engineer
Databricks
Wenchen Fan
/Senior Staff Software Engineer
Databricks