SESSION

Architecture Analysis for ETL Processing: CPU vs GPU

Accept Cookies to Play Video

OVERVIEW

EXPERIENCEIn Person
TYPELightning Talk
TRACKData Lakehouse Architecture
INDUSTRYEnterprise Technology
TECHNOLOGIESApache Spark, ETL
SKILL LEVELAdvanced
DURATION20 min
DOWNLOAD SESSION SLIDES

GPUs are well-known as accelerators for DL and ML workloads. This session will describe how GPUs can accelerate batch ETL operations. We will review the CPU and GPU architectures, including an overview of their memory subsystems. We provide a roofline analysis for individual database operations like joins, aggregations, and data compression. We discuss why these operations are well suited for GPU acceleration and can achieve up to an order of magnitude speedup. Using industry-standard benchmark queries, we demonstrate full end-to-end SQL query acceleration using GPUs in a prototype query engine. We compare the results to existing CPU solutions. Finally, we will review the performance of the same queries at a 3TB scale with the RAPIDS Accelerator for Apache Spark™, a plugin to Apache Spark™ that enables GPU acceleration with no code change.

SESSION SPEAKERS

Jason Lowe

/Distinguished System Software Engineer
NVIDIA

Nikolay Sakharnykh

/Senior AI Developer Technology Manager
NVIDIA