Ray on Apache Spark™
Overview
Ray and its associated native libraries make scaling ML projects effortless. With only minor modification to existing single-machine code, Ray enables converting ML workloads to a distributed computation environment simple, intuitive, and powerful. By leveraging the infrastructure of Spark and the massive scalability of Ray for ML workloads, running Ray on Spark allows you to scale your ML work with the libraries you want without having to switch to a different implementation paradigm or set of libraries to achieve extreme scale.
This session is a collaboration between Databricks Engineering and AnyScale Engineering. We will cover a joint effort in building an officially-supported integration. We'll also present the new integration between Spark and Ray, showcasing how you can start a Ray cluster from within Spark and leverage it for many ML use cases.
We'll dive into how to start the cluster from within a Databricks Notebook, as well as how to start the Ray dashboard and leverage it for inspecting the performance of submitted tasks. During this session, we'll explain how cluster resources are allocated on Spark, as well as the general architecture of running Ray on Spark. We'll also showcase how to convert your existing Ray workloads to run on Spark with a single line of configuration change! We'll conclude with a brief overview of using RayTune to perform hyperparameter tuning of a model, showcasing how easy and powerful it is to use the APIs for ML use cases.
Type
- Breakout
Experience
- In Person
Track
- DSML: ML Use Cases / Technologies, Databricks Experience (DBX)
Industry
- Enterprise Technology
Difficulty
- Intermediate
Duration
- 40 min
Don't miss this year's event!
Register now