Session

What's New in Apache Spark™ 4.1

Overview

ExperienceIn Person
TrackData Engineering & Streaming
IndustryHealthcare & Life Sciences, Manufacturing, Financial Services
TechnologiesLakeflow
Skill LevelIntermediate

Apache Spark 4.1 continues the evolution of the world’s leading open-source data engine, accelerating developer productivity and making modern data engineering patterns first-class citizens. Spark 4.1 improvements reduce friction, improve performance, and expand what Spark can express natively.In this session you’ll see how Spark 4.1:

  • Declarative APIs that let you define outcomes while Spark manages pipeline execution, dependencies, and failure handling
  • Real-Time Mode for streaming, enabling continuous, low-latency processing
  • Improved Python performance and ergonomics, including Arrow-native UDFs/UDTFs and better debugging
  • Expanded SQL expressiveness, with GA SQL Scripting, VARIANT support for semi-structured data, recursive CTEs, and new functions
  • Mature Spark Connect, including GA Spark ML support for Python clients and improved stability at scale

This session is a must for teams building and operating data pipelines, analytics, and streaming workloads on Spark at scale.

Session Speakers

Speaker placeholderIMAGE COMING SOON

Daniel Tenedorio

/Sr. Staff Software Engineer
Databricks

Speaker placeholderIMAGE COMING SOON

Wenchen Fan

/Senior Staff Software Engineer
Databricks