Databricks Solutions Architect and ex-McKinsey Machine Learning Engineer focused on productionizing machine learning at scale.
June 24, 2020 05:00 PM PT
ML development brings many new complexities beyond the traditional software development lifecycle. ML projects, unlike software projects, after they were successfully delivered and deployed, cannot be abandoned but must be continuously monitored if model performance still satisfies all requirements. In most ML use cases, we have to deal with updates of our training set, which can influence model performance. In addition, most models require certain data pre- and post-processing in runtime, which makes the deployment process even more challenging. In this talk, we will show how MLflow can be used to build an automated CI/CD pipeline that can deploy a new version of the model and code around it to production. In addition, we will show how the same approach can be used in the data training pipeline that will retrain model on arrival of new data and deploy the new version of the model if it satisfies all requirements.
October 16, 2019 05:00 PM PT
Does more data always improve ML models? Is it better to use distributed ML instead of single node ML?
In this talk I will show that while more data often improves DL models in high variance problem spaces (with semi or unstructured data) such as NLP, image, video more data does not significantly improve high bias problem spaces where traditional ML is more appropriate. Additionally, even in the deep learning domain, single node models can still outperform distributed models via transfer learning.
Data scientists have pain points running many models in parallel automating the experimental set up. Getting others (especially analysts) within an organization to use their models Databricks solves these problems using pandas udfs, ml runtime and MLflow.
October 16, 2019 05:00 PM PT
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models.
To solve for these challenges, Databricks unveiled last year MLflow, an open source project that aims at simplifying the entire ML lifecycle. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
In the past year, the MLflow community has grown quickly: over 120 contributors from over 40 companies have contributed code to the project, and over 200 companies are using MLflow.
In this tutorial, we will show you how using MLflow can help you:
We will demo the building blocks of MLflow as well as the most recent additions since the 1.0 release.
What you will learn:
Prerequisites:
October 16, 2019 05:00 PM PT
ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models.
To solve for these challenges, Databricks unveiled last year MLflow, an open source project that aims at simplifying the entire ML lifecycle. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.
In the past year, the MLflow community has grown quickly: over 120 contributors from over 40 companies have contributed code to the project, and over 200 companies are using MLflow.
In this tutorial, we will show you how using MLflow can help you:
We will demo the building blocks of MLflow as well as the most recent additions since the 1.0 release.
What you will learn:
Prerequisites:
October 3, 2018 05:00 PM PT
Plumbing has been a key focus of modern software engineering, with our API/services/containers/devops driven landscape so it may come as a surprise that plumbing is where AI projects tend to fail. But it is precisely because our modern software development focuses on decoupled plumbing that we have struggled to handle the rise of AI.
Specifically, companies are able to use AI effectively when they are able to create end-to-end AI model factories that explicitly account for coupling between data, models, and code.
In this talk, I will be walking through what a model factory is and how MLFlow's design supports the creation of end-to-end model factories as well as sharing best practices I've observed helping customers from startups to Fortune 50s create, productionize, and scale end-to-end ML pipelines, and watching those pipelines produce serious, game changing business impact.
Session hashtag: #SAISDS11