SESSION

Scaling MLOps to Retrain 50k Weekly Models in Parallel Using UDFs.

OVERVIEW

EXPERIENCEIn Person
TYPEBreakout
TRACKData Science and Machine Learning
INDUSTRYEnterprise Technology
TECHNOLOGIESAI/Machine Learning, Apache Spark, MLFlow
SKILL LEVELIntermediate
DURATION40 min

At data.ai, our machine learning team leverages the Databricks Platform to adopt MLOps best practices for high-frequency retraining. Our team uses Databricks and MLflow to track experiments, improve our code consistency, and to safeguard model retraining against data volatility. However, as a global data provider providing insights for the entire mobile marketplace, we face specific constraints when parallelizing model training across the tremendous combinatorics required: we train ~6 models each for >60 categories in >150 countries. Here, I will describe the framework our team has created to incorporate MLOps into weekly retraining for ~50k sklearn models in parallel. I will demonstrate how any arbitrary code can be applied in groups using Pandas UDFs and, therefore, how MLflow logging and model registration can be applied at scale to any grouped data. Finally, I will discuss the limitations of this approach and how this might be adapted for a more time-sensitive use case.

SESSION SPEAKERS

Kaleb Lowe

/Staff Machine Learning Engineer
Data.AI