Building a Real-Time Model Monitoring Pipeline on Databricks
Model deployment is almost never the final step in any ML lifecycle. ML models can degrade over time due to a variety of influencing factors. In this technical deep dive, we will build a real-time ML model monitoring pipeline on Databricks. We need to own and monitor our models for drifts like feature drift, concept drift, distribution drift, and so on. We must constantly monitor the models and issue alerts or trigger retraining when necessary. With so many open source tools and frameworks available, it can be difficult to figure out how to make everything work. In this tutorial, we will create a high-quality model monitoring pipeline. Everything will be built from the ground up using Apache Spark™ on Databricks.
In this session, we will introduce a use case in which we set up a model serving pipeline and log the predictions to a stream in real time. We will then configure a model metric monitoring pipeline to consume from the stream and aggregate over specific time windows. Then, to see these metrics live on dashboards, we will integrate a model monitoring visualizing pipeline.