Han Wang is the tech lead of Lyft Machine Learning Platform, focusing on distributed computing and machine learning solutions. Before joining Lyft, he worked at Microsoft, Hudson River Trading, Amazon and Quantlab. Han is the founder of the Fugue project, aiming at democratizing distributed computing and machine learning.
May 27, 2021 04:25 PM PT
Hyperparameter tuning is critical in model development. And its general form: parameter tuning with an objective function is also widely used in industry. On the other hand, Apache Spark can handle massive parallelism, and Apache Spark ML is a solid machine learning solution.
But we have not seen a general and intuitive distributed parameter tuning solution based on Apache Spark, why?
In this talk, we are going to show how using Fugue-Tune and Apache Spark together can eliminate these painpoints
In the demo, you will see how to do any type of tuning in a consistent, intuitive, scalable and minimal way. And you will see a live demo of the amazing performance.
[daisna21-sessions-od]
June 25, 2020 05:00 PM PT
While struggling to choose among different computing and machine learning frameworks such as Spark, Dask, Scikit-learn, Tensorflow, etc. for your ETL and machine learning projects, have you thought about unifying them into one ecosystem to use? In this talk, we will present such a framework we developed - Fugue. It’s an abstraction layer on top of different frameworks, also providing a SQL-like language that can represent your pipelines from end to end, which is highly extendable by Python. With the Fugue framework, it’s a lot easier and faster to create reliable, performant and portable pipelines than using native Spark, especially for non-expert users.
In this talk we will demonstrate how we implemented the Node2Vec algorithm on top of Fugue, so it can run on different computing frameworks and can process graphs with 100 million vertices and 3 billion edges in a few hours using Spark as the backend.
We have also built a unified interactive environment based on Kubernetes, Spark and Fugue, and will demonstrate great performance improvement on the projects migrated into this system. We will also talk about the future plan of the Fugue Project including Fugue ML and Fugue Streaming. Our goal is to create a unified ecosystem for distributed computing and machine learning.