HomepageData + AI Summit 2024 Logo
June 10–13, 2024
San Francisco + Virtual
  • Sessions
  • 2024 Call for Presentations
Apply to speak

An API for Deep Learning Inferencing on Apache Spark™

Thursday, June 29 @3:30 PM
Attending in person? Add to your schedule ↗


Apache Spark is a popular distributed framework for big data processing. It is commonly used for ETL (extract, transform and load) across large datasets. Today, the transform stage can often include the application of deep learning models on the data. For example, common models can be used for classification of images, sentiment analysis of text, language translation, anomaly detection, and many other use cases. Applying these models within Spark can be done today with the combination of PySpark, Pandas_UDF, and a lot of glue code. Often, that glue code can be difficult to get right, because it requires expertise across multiple domains - deep learning frameworks, PySpark APIs, pandas_UDF internal behavior, and performance optimization.


In this session, we introduce a new, simplified API for deep learning inferencing on Spark, introduced in SPARK-40264 as a collaboration between NVIDIA and Databricks, which seeks to standardize and open source this glue code to make deep learning inference integrations easier for everyone. We discuss its design and demonstrate its usage across multiple deep learning frameworks and models.


  • Breakout


  • In Person


  • DSML: Production ML / MLOps


  • Enterprise Technology


  • Intermediate


  • 40 min

Session Speakers

Headshot of Lee Yang

Lee Yang

Sr. Principal SW Engineer


Don't miss this year's event!

Register now