Deep Dive into the New Features of Apache Spark 3.2 and 3.3
On Demand
Type
- Session
Format
- Hybrid
Track
- Data Lakes, Data Warehouses and Data Lakehouses
Difficulty
- Intermediate
Room
- Moscone South | Upper Mezzanine | 160
Duration
- 80 min
Overview
Apache Spark has become the most widely-used engine for executing data engineering, data science and machine learning on single-node machines or clusters. The number of monthly maven downloads of Spark has rapidly increased to 20 million.
We will talk about the higher-level features and improvements in Spark 3.2 and 3.3. The talk also dives deeper into the following features
+ Introducing pandas API on Apache Spark to unify small data API and big data API.
+ Completing the ANSI SQL compatibility mode to simplify migration of SQL workloads.
+ Productionizing adaptive query execution to speed up Spark SQL at runtime.
+ Introducing RocksDB statestore to make state processing more scalable
We will talk about the higher-level features and improvements in Spark 3.2 and 3.3. The talk also dives deeper into the following features
+ Introducing pandas API on Apache Spark to unify small data API and big data API.
+ Completing the ANSI SQL compatibility mode to simplify migration of SQL workloads.
+ Productionizing adaptive query execution to speed up Spark SQL at runtime.
+ Introducing RocksDB statestore to make state processing more scalable
See the best of Data+AI Summit
Watch on demand