train_model(Python)
Loading...

Training a model and adding to the mlFlow registry

dbutils.widgets.text(name = "model_name", defaultValue = "ml-gov-demo-wine-model", label = "Model Name")
dbutils.widgets.combobox(name = "trigger_pipeline", defaultValue = "True", choices=["True","False"],label = "Trigger Pipeline")
model_name=dbutils.widgets.get("model_name")

Connect to an MLflow tracking server

MLflow can collect data about a model training session, such as validation accuracy. It can also save artifacts produced during the training session, such as a PySpark pipeline model.

By default, these data and artifacts are stored on the cluster's local filesystem. However, they can also be stored remotely using an MLflow Tracking Server.

import mlflow
mlflow.__version__

# Using the hosted mlflow tracking server
Out[15]: '1.7.0'

Training a model

Download training data

First, download the wine qualities dataset (published by Cortez et al.) that will be used to train the model.

#%sh wget https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv
wine_data_path = "/dbfs/FileStore/tables/winequality_red-42ff5.csv"

In an MLflow run, train and save an ElasticNet model for rating wines

We will train a model using Scikit-learn's Elastic Net regression module. We will fit the model inside a new MLflow run (training session), allowing us to save performance metrics, hyperparameter data, and model artifacts for future reference. If MLflow has been connected to a tracking server, this data will be persisted to the tracking server's file and artifact stores, allowing other users to view and download it. For more information about model tracking in MLflow, see the MLflow tracking reference.

Later, we will use the saved MLflow model artifacts to deploy the trained model to Azure ML for real-time serving.

import os
import warnings
import sys

import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import ElasticNet

import mlflow
import mlflow.sklearn


def eval_metrics(actual, pred):
    rmse = np.sqrt(mean_squared_error(actual, pred))
    mae = mean_absolute_error(actual, pred)
    r2 = r2_score(actual, pred)
    return rmse, mae, r2


def train_model(wine_data_path, model_path, alpha, l1_ratio):
    warnings.filterwarnings("ignore")
    np.random.seed(40)

    # Read the wine-quality csv file (make sure you're running this from the root of MLflow!)
    data = pd.read_csv(wine_data_path, sep=None)

    # Split the data into training and test sets. (0.75, 0.25) split.
    train, test = train_test_split(data)

    # The predicted column is "quality" which is a scalar from [3, 9]
    train_x = train.drop(["quality"], axis=1)
    test_x = test.drop(["quality"], axis=1)
    train_y = train[["quality"]]
    test_y = test[["quality"]]

    # Start a new MLflow training run 
    with mlflow.start_run():
        # Fit the Scikit-learn ElasticNet model
        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
        lr.fit(train_x, train_y)

        predicted_qualities = lr.predict(test_x)

        # Evaluate the performance of the model using several accuracy metrics
        (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

        print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        print("  R2: %s" % r2)

        # Log model hyperparameters and performance metrics to the MLflow tracking server
        # (or to disk if no)
        mlflow.log_param("alpha", alpha)
        mlflow.log_param("l1_ratio", l1_ratio)
        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("r2", r2)
        mlflow.log_metric("mae", mae)

        mlflow.sklearn.log_model(lr, model_path)
        
        return mlflow.active_run().info.run_uuid
alpha_1 = 0.75
l1_ratio_1 = 0.25
model_path = 'model'
run_id1 = train_model(wine_data_path=wine_data_path, model_path=model_path, alpha=alpha_1, l1_ratio=l1_ratio_1)
model_uri = "runs:/"+run_id1+"/model"
Elasticnet model (alpha=0.750000, l1_ratio=0.250000): RMSE: 0.7837307525653582 MAE: 0.6165474987409884 R2: 0.1297029612600864
print(model_uri)
runs:/9fc18078ac9b48489081074df17c17bb/model

Register the Model in the Model Registry

import time
result = mlflow.register_model(
    model_uri,
    model_name
)
time.sleep(10)
version = result.version
Successfully registered model 'tech-summit-wine-model'. Created version '1' of model 'tech-summit-wine-model'.