Session

GPU Accelerated Spark Connect

Overview

ExperienceIn Person
TypeBreakout
TrackData Engineering and Streaming
IndustryEnterprise Technology, Retail and CPG - Food, Financial Services
TechnologiesApache Spark
Skill LevelIntermediate
Duration40 min

Spark Connect, first included for SQL/DataFrame API in Apache Spark 3.4 and recently extended to MLlib in 4.0, introduced a new way to run Spark applications over a gRPC protocol. This has many benefits, including easier adoption for non-JVM clients, version independence from applications and increased stability and security of the associated Spark clusters.

 

The recent Spark Connect extension for ML also included a plugin interface to configure enhanced server-side implementations of the MLlib algorithms when launching the server. 

 

In this talk, we shall demonstrate how this new interface, together with Spark SQL’s existing plugin interface, can be used with NVIDIA GPU-accelerated plugins for ML and SQL to enable no-code change, end-to-end GPU acceleration of Spark ETL and ML applications over Spark Connect, with optimal performance up to 9x at 80% cost reduction compared to CPU baselines.

Session Speakers

Gera Shegalov

/Principal Distributed Systems Engineer
NVIDIA

Erik Ordentlich

/Sr. Manager
NVIDIA