GetStartedWithMLflowWithR(R)

Loading...


Description: MLflow Quick Start: Training and Logging

In this tutorial we will:

  • Install MLflow for R on a Databricks cluster
  • Train a regression model on the wine quality dataset and log metrics, parameters and models
  • View the results of training in the MLflow tracking UI
  • Explore serving a model in batch

Setup

  1. This notebook was tested using DBR 8.0 and MLflow 1.14.1
  2. Attach this notebook to your cluster

Installing MLflow

## Install ML FLow from CRAN, and carrier as well
install.packages('mlflow')
install.packages('carrier')
 
## Load the library and others we need for the notebook
library(mlflow)
library(httr)
library(SparkR)
library(glmnet)
library(carrier)
 
## Complete the installation
install_mlflow()

Organize MLflow Runs into Experiments

As you start using your MLflow server for more tasks, you may want to separate them out. MLflow allows you to create experiments to organize your runs. Use mlflow_create_experiment() and specify a path in your workspace for the experiment to live. This returns and Experiment ID, which you will need below in cell 14.

# Replace with your path
mlflow_create_experiment("/Shared/EstimatingWineQuality")

To record your run to an experiment, reuse the path to the experiment in your workspace with the mlflow_set_experiment() function.

mlflow_set_experiment("/Shared/EstimatingWineQuality")

Load Wine Quality Data

We'll use the UCI Machine Learning wine quality data set for this exercise.

# Read the wine-quality csv files
reds <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv", sep = ";")
whites <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv", sep = ";")
 
wine_quality <- rbind(reds, whites)
 
head(wine_quality)

Define Function to Train with Input Parameters

## Create a function to train based on different parameters
train_wine_quality <- function(data, alpha, lambda, model_name = "model") {
 
# Split the data into training and test sets. (0.75, 0.25) split.
sampled <- base::sample(1:nrow(data), 0.75 * nrow(data))
train <- data[sampled, ]
test <- data[-sampled, ]
 
# The predicted column is "quality" which is a scalar from [3, 9]
train_x <- as.matrix(train[, !(names(train) == "quality")])
test_x <- as.matrix(test[, !(names(train) == "quality")])
train_y <- train[, "quality"]
test_y <- test[, "quality"]
 
## Define the parameters used in each MLflow run
alpha <- mlflow_param("alpha", alpha, "numeric")
lambda <- mlflow_param("lambda", lambda, "numeric")
 
with(mlflow_start_run(), {
    model <- glmnet(train_x, train_y, alpha = alpha, lambda = lambda, family= "gaussian", standardize = FALSE)
    l1se <- cv.glmnet(train_x, train_y, alpha = alpha)$lambda.1se
    predictor <- carrier::crate(~ glmnet::predict.glmnet(!!model, as.matrix(.x)), !!model, s = l1se)
  
    predicted <- predictor(test_x)
 
    rmse <- sqrt(mean((predicted - test_y) ^ 2))
    mae <- mean(abs(predicted - test_y))
    r2 <- as.numeric(cor(predicted, test_y) ^ 2)
 
    message("Elasticnet model (alpha=", alpha, ", lambda=", lambda, "):")
    message("  RMSE: ", rmse)
    message("  MAE: ", mae)
    message("  R2: ", mean(r2, na.rm = TRUE))
 
    ## Log the parameters associated with this run
    mlflow_log_param("alpha", alpha)
    mlflow_log_param("lambda", lambda)
  
    ## Log metrics we define from this run
    mlflow_log_metric("rmse", rmse)
    mlflow_log_metric("r2", mean(r2, na.rm = TRUE))
    mlflow_log_metric("mae", mae)
  
    # Save plot to disk
    png(filename = "ElasticNet-CrossValidation.png")
    plot(cv.glmnet(train_x, train_y, alpha = alpha), label = TRUE)
    dev.off()
  
    ## Log that plot as an artifact
    mlflow_log_artifact("ElasticNet-CrossValidation.png")
 
    mlflow_log_model(predictor, model_name)
  
})
  }

Training Runs with Different Hyperparameters

You could, of course, train with different data sets here as well.

%r
set.seed(98118)
 
model_name = "model"
 
## Run 1
train_wine_quality(data = wine_quality, alpha = 0.03, lambda = 0.98, model_name)
 
## Run 2
train_wine_quality(data = wine_quality, alpha = 0.14, lambda = 0.4, model_name)
 
## Run 3
train_wine_quality(data = wine_quality, alpha = 0.20, lambda = 0.99, model_name)

Review the MLflow UI

Visit your tracking server by opening up the experiment in your workspace.

Inside the UI, you can:

  • View your experiments and runs under /Shared/EstimatingWineQuality/experiment_id
  • Review the parameters and metrics on each run
  • Click each run for a detailed view to see the the model, images, and other artifacts produced.

Model Serving via Batch Process

MLflow supports many options for model serving. Here we demonstrate the simplest and most common - batch - using mlflow_load_model() to fetch a previously logged model from the tracking server and load it into memory. This requires specifying the path to the model in artifact storage, also known as the model_uri. The URI is composed of the path to artifact storage for a run, plus an additional sub-folder with the model name. Consider the following model URI:

dbfs:/databricks/mlflow///artifacts/model

Here is what each component refers to:

dbfs:/databricks/mlflow/experiment_id/run_id/artifact_storage/model_name

We can discover the model URI programmatically.

# Search runs for best r^2
runs <- mlflow_search_runs(experiment_ids = "ENTER_YOUR_EXPERIMENT_ID", order_by = "metrics.r2 DESC")
 
# Construct model URI
model_uri <- paste(runs$artifact_uri[1], model_name, sep = "/")
message(c("Model URI: \n", model_uri))
## Load the model
best_model <- mlflow_load_model(model_uri = model_uri)
 
## Generate prediction on 5 rows of data 
predictions <- data.frame(mlflow_predict(best_model, data = wine_quality[1:5, !(names(wine_quality) == "quality")]))
                          
names(predictions) <- "wine_quality_pred"
 
## Take a look
display(predictions)