HomepageData + AI Summit 2023 Logo
JUNE 26-29, 2023
SAN FRANCISCO + VIRTUAL
Attend Live

Deep Dive into the New Features of Apache Spark 3.2 and 3.3

On Demand

Type

  • Session

Format

  • Hybrid

Track

  • Data Lakes, Data Warehouses and Data Lakehouses

Difficulty

  • Intermediate

Room

  • Moscone South | Upper Mezzanine | 160

Duration

  • 80 min
Download session slides

Overview

Apache Spark has become the most widely-used engine for executing data engineering, data science and machine learning on single-node machines or clusters. The number of monthly maven downloads of Spark has rapidly increased to 20 million.

We will talk about the higher-level features and improvements in Spark 3.2 and 3.3. The talk also dives deeper into the following features
+ Introducing pandas API on Apache Spark to unify small data API and big data API.
+ Completing the ANSI SQL compatibility mode to simplify migration of SQL workloads.
+ Productionizing adaptive query execution to speed up Spark SQL at runtime.
+ Introducing RocksDB statestore to make state processing more scalable

Session Speakers

Headshot of Xiao Li

Xiao Li

Databricks

Headshot of Wenchen Fan

Wenchen Fan

Databricks

Headshot of Daniel Tenedorio

Daniel Tenedorio

Sr. Staff Engineer

Databricks

See the best of Data+AI Summit

Watch on demand