What is Managed MLflow?
Managed MLflow is built on top of MLflow, an open source platform developed by Databricks to help manage the complete machine learning lifecycle with enterprise reliability, security and scale.
Accelerate and simplify machine learning lifecycle management with a standardized framework for developing production-ready ML models.
With managed MLflow Recipes, you can bootstrap ML projects, perform rapid iteration with ease and ship high-quality models to production at scale.
Run experiments with any ML library, framework or language, and automatically keep track of parameters, metrics, code and models from each experiment. By using MLflow on Databricks, you can securely share, manage and compare experiment results along with corresponding artifacts and code versions — thanks to built-in integrations with the Databricks Workspace and notebooks.
Use one central place to discover and share ML models, collaborate on moving them from experimentation to online testing and production, integrate with approval and governance workflows and CI/CD pipelines, and monitor ML deployments and their performance. The MLflow Model Registry facilitates sharing of expertise and knowledge, and helps you stay in control.
Quickly deploy production models for batch inference on Apache Spark™ or as REST APIs using built-in integration with Docker containers, Azure ML or Amazon SageMaker. With Managed MLflow on Databricks, you can operationalize and monitor production models using Databricks Jobs Scheduler and auto-managed Clusters to scale based on the business needs.
MLflow Tracking: Automatically log parameters, code versions, metrics, and artifacts for each run using Python, REST, R API, and Java API
MLflow Tracking Server: Get started quickly with a built-in tracking server to log all runs and experiments in one place. No configuration needed on Databricks.
Experiment Management: Create, secure, organize, search, and visualize experiments from within the Workspace with access control and search queries.
MLflow Run Sidebar: Automatically track runs from within notebooks and capture a snapshot of your notebook for each run, so that you can always go back to previous versions of your code.
Logging Data with Runs: Log parameters, data sets, metrics, artifacts and more as runs to local files, to a SQLAlchemy compatible database, or remotely to a tracking server.
Delta Lake Integration: Track large-scale data sets that fed your models with Delta Lake snapshots.
Artifact Store: Store large files such as S3 buckets, shared NFS file system, and models in Amazon S3, Azure Blob Storage, Google Cloud Storage, SFTP server, NFS, and local file paths.
Simplified project startup: MLflow Recipes provides out-of-box connected components for building and deploying ML models.
Accelerated model iteration: MLflow Recipes creates standardized, reusable steps for model iteration — making the process faster and less expensive.
Automated team handoffs: Opinionated structure provides modularized production-ready code, enabling automatic handoff from experimentation to production.
MLflow Projects: MLflow projects allow you to specify the software environment that is used to execute your code. MLflow currently supports the following project environments: Conda environment, Docker container environment, and system environment. Any Git repo or local directory can be treated as an MLflow project.
Remote Execution Mode: Run MLflow Projects from Git or local sources remotely on Databricks clusters using the Databricks CLI to quickly scale your code.
MLflow Model Registry
Central Repository: Register MLflow models with the MLflow Model Registry. A registered model has a unique name, version, stage, and other metadata.
Model Versioning: Automatically keep track of versions for registered models when updated.
Model Stage: Assign preset or custom stages to each model version, like “Staging” and “Production” to represent the lifecycle of a model.
CI/CD Workflow Integration: Record stage transitions, request, review and approve changes as part of CI/CD pipelines for better control and governance.
Model Stage Transitions: Record new registration events or changes as activities that automatically log users, changes, and additional metadata such as comments.
MLflow Models: A standard format for packaging machine learning models that can be used in a variety of downstream tools — for example, real-time serving through a REST API or batch inference on Apache Spark.
Model Customization: Use Custom Python Models and Custom Flavors for models from an ML library that is not explicitly supported by MLflow’s built-in flavors.
Built-In Model Flavors: MLflow provides several standard flavors that might be useful in your applications, like Python and R functions, H20, Keras, MLeap, PyTorch, scikit-learn, Spark MLlib, TensorFlow, and ONNX.
BUILT-IN DEPLOYMENT TOOLS: Quickly deploy on Databricks via Apache Spark UDF for a local machine, or several other production environments such as Microsoft Azure ML, Amazon SageMaker, and building Docker Images for Deployment.
See our Product News from Azure Databricks and AWS to learn more about our latest features.
Comparing MLflow offerings
How it works
MLflow is a lightweight set of APIs and user interfaces that can be used with any ML framework throughout the Machine Learning workflow. It includes four components: MLflow Tracking, MLflow Projects, MLflow Models and MLflow Model Registry
MLflow Tracking: Record and query experiments: code, data, config, and results.
MLflow Projects: Packaging format for reproducible runs on any platform.
MLflow Models: General format for sending models to diverse deployment tools.
MLflow Model Registry: Centralized repository to collaboratively manage MLflow models throughout the full lifecycle.
Managed MLflow on Databricks is a fully managed version of MLflow providing practitioners with reproducibility and experiment management across Databricks Notebooks, Jobs, and data stores, with the reliability, security, and scalability of the Unified Data Analytics Platform.
AutoML Rapid, simplified machine learning for everyone
MLOps Virtual Event: Standardizing MLOps at Scale
Automating the ML Lifecycle With Databricks Machine Learning
Automated Hyperparameter Tuning, Scaling and Tracking on Databricks
What’s New With MLflow? On-Demand Webinar and FAQs Now Available
Managing the Complete Machine Learning Lifecycle: On-Demand Webinar Now Available
Understanding MLflow: Ask the Experts
Financial Fraud Detection Using Decision Tree Machine Learning Models
Using Dynamic Time Warping and MLflow to Detect Sales Trends (dbc format)
Databricks Extends MLflow Model Registry With Enterprise Features
How to Display Model Metrics in Dashboards Using the MLflow Search API
Automate Deployment and Testing With Databricks Notebook + MLflow
Introducing the MLflow Model Registry
A Guide to MLflow Talks at Spark + AI Summit 2019 Europe
Productionizing Machine Learning: From Deployment to Drift Detection
Hyperparameter Tuning With MLflow, Apache Spark MLlib and Hyperopt
A Guide to MLflow Talks at Spark + AI Summit 2019
MLflow On-Demand Webinar and FAQ Now Available
Introducing MLflow: An Open Source Machine Learning Platform
Standardizing the Machine Learning Lifecycle
Four Real-Life Machine Learning Use Cases
Detecting Financial Fraud at Scale With Decision Trees and MLflow on Databricks
Using Dynamic Time Warping and MLflow to Detect Sales Trends (Part 1)
Using Dynamic Time Warping and MLflow to Detect Sales Trends (Part 2)
How to Use MLflow to Experiment a Keras Network Model: Binary Classification for Movie Reviews
How to Use MLflow, TensorFlow and Keras With PyCharm
How to Use MLflow to Reproduce Results and Retrain Saved Keras ML Models
Comcast: How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
Gojek: Scaling Ride-Hailing With Machine Learning on MLflow
Showtime: Data-Driven Transformation: Leveraging Big Data at Showtime With Apache Spark
Best Practices for Hyperparameter Tuning With MLflow
Advanced Hyperparameter Optimization for Deep Learning With MLflow
RStudio: Managing the Machine Learning Lifecycle With MLflow and R
Splice Machine’s Use of Apache Spark and MLflow
Kount: Moving a Fraud-Fighting Random Forest From scikit-learn to Spark With MLlib