Platform blog

MLOps on Databricks with Vertex AI on Google Cloud

August 12, 2022 in Partners

Share this post

Since the launch of Databricks on Google Cloud in early 2021, Databricks and Google Cloud have been partnering together to further integrate the Databricks platform into the cloud ecosystem and its native services. Databricks is built on or tightly integrated with many Google Cloud native services today, including Cloud Storage, Google Kubernetes Engine, and BigQuery. Databricks and Google Cloud are excited to announce an MLflow and Vertex AI deployment plugin to accelerate the model development lifecycle.

Why is MLOps difficult today?

The standard DevOps practices adopted by software companies that allow for rapid iteration and experimentation often do not translate well to data scientists. Those practices include both human and technological concepts such as workflow management, source control, artifact management, and CICD. Given the added complexity of the nature of machine learning (model tracking and model drift), MLOps is difficult to put into practice today, and a good MLOps process needs the right tooling.

Today's machine learning (ML) ecosystem includes a diverse set of tools that might specialize and serve a portion of the ML lifecycle, but not many provide a full end to end solution – this is why Databricks teamed up with Google Cloud to build a seamless integration that leverages the best of MLflow and Vertex AI to allow Data Scientists to safely train their models, Machine Learning Engineers to productionalize and serve that model, and Model Consumers to get their predictions for business needs.

MLflow is an open source library developed by Databricks to manage the full ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. Vertex AI is Google Cloud’s unified artificial intelligence platform that offers an end-to-end ML solution, from model training to model deployment. Data scientists and machine learning engineers will be able to deploy their models into production on Vertex AI for real-time model serving using pre-built Prediction images and ensuring model quality and freshness using model monitoring tools thanks to this new plugin, which allows them to train their models on Databricks' Managed MLflow while utilizing the power of Apache Spark™ and open source Delta Lake (as well as its packaged ML Runtime, AutoML, and Model Registry).

Note: The plugin also has been tested and works well with open source MLflow.

Technical Demo

Let's show you how to build an end-to-end MLOps solution using MLflow and Vertex AI. We will train a simple scikit-learn diabetes model with MLflow, save it into the Model Registry, and deploy it into a Vertex AI endpoint.

Before we begin, it's important to understand what goes on behind the scenes when using this integration. Looking at the reference architecture below, you can see the Databricks components and Google Cloud services used for this integration:


End-to-end MLOps solution using MLflow and Vertex AI

Note: The following steps will assume that you have a Databricks Google Cloud workspace deployed with the right permissions to Vertex AI and Cloud Build set up on Google Cloud.

Step 1: Create a Service Account with the right permissions to access Vertex AI resources and attach it to your cluster with MLR 10.x.

Step 2: Download the google-cloud-mlflow plugin from PyPi onto your cluster. You can do this by downloading directly onto your cluster as a library or run the following pip command in a notebook attached to your cluster:

%pip install google-cloud-mlflow

Step 3: In your notebook, import the following packages:

import mlflow
from mlflow.deployments import get_deploy_client
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes 
from sklearn.ensemble import RandomForestRegressor
import pandas as pd
import numpy as np

Step 3: Train, test, and autolog a scikit-learn experiment, including the hyperparameters used and test results with MLflow.

# load dataset
db = load_diabetes()
X = db.data
y = db.target
X_train, X_test, y_train, y_test = train_test_split(X, y)
 
# mlflow.sklearn.autolog() requires mlflow 1.11.0 or above.
mlflow.sklearn.autolog()
 
# With autolog() enabled, all model parameters, a model score, and the fitted model are automatically logged.  
with mlflow.start_run() as run:  
  # Set the model parameters. 
  n_estimators = 100
  max_depth = 6
  max_features = 3
  # Create and train model.
  rf = RandomForestRegressor(n_estimators = n_estimators, max_depth = max_depth, max_features = max_features)
  rf.fit(X_train, y_train)
  # Use the model to make predictions on the test dataset.
  predictions = rf.predict(X_test)
  
mlflow.end_run()

Step 4: Log the model into the MLflow Registry, which saves model artifacts into Google Cloud Storage.

model_name = "vertex-sklearn-blog-demo"
mlflow.sklearn.log_model(rf, model_name, registered_model_name=model_name)


Registered Models in the MLflow Model Registry

Step 5: Programmatically get the latest version of the model using the MLflow Tracking Client. In a real case scenario you will likely transition the model from stage to production in your CICD process once the model has met production standards.

client = mlflow.tracking.MLflowClient()
model_version_infos = client.search_model_versions(f"name = '{model_name}'")
model_version = max([int(model_version_info.version) for model_version_info in model_version_infos])
model_uri=f"models:/{model_name}/{model_version}"

# model_uri should be models:/vertex-sklearn-blog-demo/1

Step 6: Instantiate the Vertex AI client and deploy to an endpoint using just three lines of code.

# Really simple Vertex client instantiation
vtx_client = mlflow.deployments.get_deploy_client("google_cloud")
deploy_name = f"{model_name}-{model_version}"

# Deploy to Vertex AI using three lines of code! Note: If using python > 3.7, this may take up to 20 minutes.
deployment = vtx_client.create_deployment(
    name=deploy_name,
    model_uri=model_uri)

Step 7: Check the UI in Vertex AI and see the published model.


Vertex AI in the Google Cloud Console

Step 8: Invoke the endpoint using the plugin within the notebook for batch inference. In a real-case production scenario, you will likely invoke the endpoint from a web service or application for real time inference.

# Use the .predict() method from the same plugin
predictions = vtx_client.predict(deploy_name, X_test)

Your predictions should return the following Prediction class, which you can proceed to parse into a pandas dataframe and use for your business needs:

Prediction(predictions=[108.8213062661298, 121.8157069007118, 196.7929187443363, 159.9036896543356, 276.4400040206476, 100.4831327904369, 98.03313768162721, 170.2935904379434, 123.854209126032, 200.582723610864, 243.8882952682826, 89.56782205639794, 225.6276360204631, 183.9313416074667, 182.1405547852122, 179.3878755228988, 149.3434367420051, ...

Conclusion

As you can see, MLOps doesn't have to be difficult. Using the end to end MLflow to Vertex AI solution, data teams can go from development to production in matters of days vs. weeks, months, or sometimes never! For a live demo of the end to end workflow, check out the on-demand session "Accelerating MLOps Using Databricks and Vertex AI on Google Cloud" during DAIS 2022.

To start your ML journey today, import the demo notebook into your workspace today. First-time customers can take advantage of partnership credits and start a free Databricks on Google Cloud trial. For any questions, please reach out to us using this contact form.

Try Databricks for free

Related posts

See all Partners posts