GPU Accelerated Spark Connect
Overview
Experience | In Person |
---|---|
Type | Breakout |
Track | Data Engineering and Streaming |
Industry | Enterprise Technology, Retail and CPG - Food, Financial Services |
Technologies | Apache Spark |
Skill Level | Intermediate |
Duration | 40 min |
Spark Connect, first included for SQL/DataFrame API in Apache Spark 3.4 and recently extended to MLlib in 4.0, introduced a new way to run Spark applications over a gRPC protocol. This has many benefits, including easier adoption for non-JVM clients, version independence from applications and increased stability and security of the associated Spark clusters.
The recent Spark Connect extension for ML also included a plugin interface to configure enhanced server-side implementations of the MLlib algorithms when launching the server.
In this talk, we shall demonstrate how this new interface, together with Spark SQL’s existing plugin interface, can be used with NVIDIA GPU-accelerated plugins for ML and SQL to enable no-code change, end-to-end GPU acceleration of Spark ETL and ML applications over Spark Connect, with optimal performance up to 9x at 80% cost reduction compared to CPU baselines.
Session Speakers
Gera Shegalov
/Principal Distributed Systems Engineer
NVIDIA
Erik Ordentlich
/Sr. Manager
NVIDIA