Session
What’s New in Apache Spark™ 4.0?

Overview
Thursday
June 12
12:30 pm
Experience | In Person |
---|---|
Type | Breakout |
Track | Data Engineering and Streaming |
Industry | Enterprise Technology |
Technologies | Apache Spark |
Skill Level | Intermediate |
Duration | 40 min |
Join this session for a concise tour of Apache Spark™ 4.0’s most notable enhancements:
- SQL features: ANSI by default, scripting, SQL pipe syntax, SQL UDF, session variable, view schema evolution, etc.
- Data type: VARIANT type, string collation
- Python features: Python data source, plotting API, etc.
- Streaming improvements: State store data source, state store checkpoint v2, arbitrary state v2, etc.
- Spark Connect improvements: More API coverage, thin client, unified Scala interface, etc.
- Infrastructure: Better error message, structured logging, new Java/Scala version support, etc.
Whether you’re a seasoned Spark user or new to the ecosystem, this talk will prepare you to leverage Spark 4.0’s latest innovations for modern data and AI pipelines.
Session Speakers
Daniel Tenedorio
/Sr. Staff Software Engineer
Databricks
Wenchen Fan
/Senior Staff Software Engineer
Databricks