Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS Library-continues

GPU acceleration has been at the heart of scientific computing and artificial intelligence for many years now. GPUs provide the computational power needed for the most demanding applications such as Deep Neural Networks, nuclear or weather simulation. Since the launch of RAPIDS in mid-2018, this vast computational resource has become available for Data Science workloads too. The RAPIDS toolkit, which is now available on the Databricks Unified Analytics Platform, is a GPU-accelerated drop-in replacement for utilities such as Pandas/NumPy/ScikitLearn/XGboost.

Through its use of Dask wrappers the platform allows for true, large scale computation with minimal, if any, code changes. The goal of this talk is to discuss RAPIDS, its functionality, architecture as well as the way it integrates with Spark providing on many occasions several orders of magnitude acceleration versus its CPU-only counterparts.


Try Databricks
See More Spark + AI Summit Europe 2019 Videos

« back
Miguel Martinez
About Miguel Martinez


Miguel Martínez is a Deep Learning Solution Architect at NVIDIA, where he concentrates on RAPIDS. Previously, he mentored students at Udacity's Artificial Intelligence Nanodegree. He has a strong background in financial services, mainly focused on payments and channels. As a constant and steadfast learner, he is always up for new challenges.

About Thomas Graves


Thomas Graves is a distributed systems software engineer at NVIDIA, where he concentrates on accelerating Spark. He is a committer and PMC on Apache Spark and Apache Hadoop. Previously worked for Yahoo on the Big Data Platform team working on Apache Spark, Hadoop, YARN, Storm, and Kafka.