Mary Grace Moesta is currently a Data Science Consultant at Databricks working with our commercial and mid market customers. As a former data scientist, she worked with Apache Spark on projects focused on machine learning and statistical inference specifically in the retail / CPG space. With previous research in Markov Chain modeling and infectious disease modeling, she enjoys applying mathematics to real work problems.
May 27, 2021 12:10 PM PT
With data as a valuable currency and the architecture of reliable, scalable Data Lakes and Lakehouses continuing to mature, it is crucial that machine learning training and deployment techniques keep up to realize value. Reproducibility, efficiency, and governance in training and production environments rest on the shoulders of both point in time snapshots of the data and a governing mechanism to regulate, track, and make best use of associated metadata.
This talk will outline the challenges and importance of building and maintaining reproducible, efficient, and governed machine learning solutions as well as posing solutions built on open source technologies - namely Delta Lake for data versioning and MLflow for efficiency and governance.
June 23, 2020 05:00 PM PT
Often times model deployment and integration consists of several moving parts that require intricate steps woven together. Automating this pipeline and feedback loop can be incredibly challenging, especially in lieu of varying model development techniques. MLflow and the model registry can act as powerful tools to simply building a robust CI/CD pattern for any given model In this talk we will explore how MLflow- specifically the model registry - can be integrated with continuous integration, continuous development, and continuous deployment tools. We'll walk though an end to end example of designing a CI/CD process for a model deployment and implementing with MLflow and automation tools
October 16, 2019 05:00 PM PT
Instead of better understanding and optimizing their machine learning models, data scientists spend a majority of their time training and iterating through different models even in cases where there the data is reliable and clean. Important aspects of creating an ML model include (but are not limited to) data preparation, feature engineering, identifying the correct models, training (and continuing to train) and optimizing their models. This process can be (and often is) laborious and time-consuming.
In this session, we will explore this process and then show how the AutoML toolkit (from Databricks Labs) can significantly simplify and optimize machine learning. We will demonstrate all of this financial loan risk data with code snippets and notebooks that will be free to download.