HomepageData + AI Summit 2022 Logo
Watch on demand

Distributed Machine Learning at Lyft

On Demand


  • Session


  • Hybrid


  • Data Science, Machine Learning and MLOps


  • Intermediate


  • Moscone South | Upper Mezzanine | 156


  • 35 min
Download session slides

Vue d'ensemble

Data collection, preprocessing, feature engineering are the fundamental steps in any Machine Learning Pipeline. After feature engineering, being able to parallelize training on multiple low cost machines helps to reduce cost and time both. Having access to train models on larger datasets almost always creates better models. And, then being able to train models in a distributed manner speeds up Hyperparameter Tuning. How can we unify these stages of ML Pipeline in one unified distributed training platform together? And that too on Kubernetes?

Our ML platform is completely based on Kubernetes because of its scalability and rapid bootstrapping time of resources. In this talk we will demonstrate how Lyft uses Spark on Kubernetes, Fugue (our home grown data processing and parameter optimization framework built on top of spark) to design a holistic end to end ML Pipeline system for distributed training experience for our customers on our ML Platform.

We will also do a deep dive to show how we are leveraging these open source technologies with Spark underneath while at the same time abstracting and hiding these complexities from our Data Scientists and Research Scientist so that they can focus only on the business logic for their models through simple pythonic APIs and SQL. We let the users focus on “what to do” and the platform takes care of “how to do”. Using Spark on Kubernetes have helped us achieve large scale data processing with 90% less cost and at times bringing down processing time from 2 hours to less then 20 mins.

Session Speakers

Anindya Saha

ML Platform Software Engineer

Lyft Inc.

Han Wang

Senior Staff Engineer

Lyft Inc.

Visionnez les temps forts du Data+AI Summit

Watch on demand