JUNE 26-29, 2023
SAN FRANCISCO + VIRTUAL
지금 등록하기

Deep Dive into the New Features of Apache Spark 3.2 and 3.3

On Demand

Type

  • Session

Format

  • Hybrid

Track

  • 데이터 레이크, 데이터 웨어하우스 및 데이터 레이크하우스

Difficulty

  • Intermediate

Room

  • Moscone South | Upper Mezzanine | 160

Duration

  • 80 min
Download session slides

개요

Apache Spark has become the most widely-used engine for executing data engineering, data science and machine learning on single-node machines or clusters. The number of monthly maven downloads of Spark has rapidly increased to 20 million.

We will talk about the higher-level features and improvements in Spark 3.2 and 3.3. The talk also dives deeper into the following features
+ Introducing pandas API on Apache Spark to unify small data API and big data API.
+ Completing the ANSI SQL compatibility mode to simplify migration of SQL workloads.
+ Productionizing adaptive query execution to speed up Spark SQL at runtime.
+ Introducing RocksDB statestore to make state processing more scalable

Session Speakers

Headshot of Xiao Li

Xiao Li

Databricks

Headshot of Wenchen Fan

Wenchen Fan

Databricks

Headshot of Daniel Tenedorio

Daniel Tenedorio

Sr. Staff Engineer

Databricks

Data+AI Summit 하이라이트 보기

Watch on demand