HomepageData + AI Summit 2023 Logo
  • Sessions
Watch on demand

Ray on Apache Spark™

Tuesday, June 27 @4:00 PM
Attending in person? Add to your schedule ↗


Ray and its associated native libraries make scaling ML projects effortless. With only minor modification to existing single-machine code, Ray enables converting ML workloads to a distributed computation environment simple, intuitive, and powerful. By leveraging the infrastructure of Spark and the massive scalability of Ray for ML workloads, running Ray on Spark allows you to scale your ML work with the libraries you want without having to switch to a different implementation paradigm or set of libraries to achieve extreme scale.


This session is a collaboration between Databricks Engineering and AnyScale Engineering. We will cover a joint effort in building an officially-supported integration. We'll also present the new integration between Spark and Ray, showcasing how you can start a Ray cluster from within Spark and leverage it for many ML use cases.


We'll dive into how to start the cluster from within a Databricks Notebook, as well as how to start the Ray dashboard and leverage it for inspecting the performance of submitted tasks. During this session, we'll explain how cluster resources are allocated on Spark, as well as the general architecture of running Ray on Spark. We'll also showcase how to convert your existing Ray workloads to run on Spark with a single line of configuration change! We'll conclude with a brief overview of using RayTune to perform hyperparameter tuning of a model, showcasing how easy and powerful it is to use the APIs for ML use cases.


  • Breakout


  • In Person


  • DSML: ML Use Cases / Technologies, Databricks Experience (DBX)


  • Enterprise Technology


  • Intermediate


  • 40 min
Download session slides

Session Speakers

Headshot of Ben Wilson

Ben Wilson

Principal Specialist Solutions Architect


Headshot of Jiajun Yao

Jiajun Yao

Software Engineer


Don't miss this year's event!

Register now