Skip to main content

AI and Machine Learning

Accelerate your AI projects with a data-centric approach to machine learning

Built on an open lakehouse architecture, AI and Machine Learning on Databricks empowers ML teams to prepare and process data, streamlines cross-team collaboration and standardizes the full ML lifecycle from experimentation to production including for generative AI and large language models.

Cona

$6M+ in savings

CONA Services uses Databricks for the full ML lifecycle to optimize the supply chain for hundreds of thousands of stores.

Learn more
VIA

R$3.9M in savings

Via leverages machine learning to accurately forecast demand, reducing compute costs by 25%.

Learn more
Amgen

$50M+ in cost reduction

Amgen improves data science collaboration to accelerate drug discovery and save operational costs.

Learn more
Machine Learning

Simplify all aspects of data for AI and ML

Because Databricks ML is built on an open lakehouse foundation with Delta Lake, you can empower your machine learning teams to access, explore and prepare any type of data at any scale. Turn features into production pipelines in a self-service manner without depending on data engineering support.

Machine Learning

Automate experiment tracking and governance

Managed MLflow automatically tracks your experiments and logs parameters, metrics, versioning of data and code, as well as model artifacts with each training run. You can quickly see previous runs, compare results and reproduce a past result, as needed. Once you have identified the best version of a model for production, register it to the Model Registry to simplify handoffs along the deployment lifecycle.

Machine Learning

Manage the full model lifecycle from data to production — and back

Once trained models are registered, you can collaboratively manage them through their lifecycle with the Model Registry. Models can be versioned and moved through various stages, like experimentation, staging, production and archived. The lifecycle management integrates with approval and governance workflows according to role-based access controls. Comments and email notifications provide a rich collaborative environment for data teams.

ML Flow

Deploy ML models at scale and low latency

Deploy models with a single click without having to worry about server management or scale constraints. With Databricks, you can deploy your models as REST API endpoints anywhere with enterprise-grade availability.

LLM

Use generative AI and large language models

Integrate existing pretrained models — such as those from the Hugging Face transformers library or other open source libraries — into your workflow. Transformer pipelines make it easy to use GPUs and allow batching of items sent to the GPU for better throughput. 

Customize a model on your data for your specific task. With the support of open source tooling, such as Hugging Face and DeepSpeed, you can quickly and efficiently take a foundation LLM and start training with your own data to have more accuracy for your domain and workload. This also gives you control to govern the data used for training so you can make sure you’re using AI responsibly.

Product components

Icon Orange

Collaborative Notebooks

Databricks notebooks natively support Python, R, SQL and Scala so practitioners can work together with the languages and libraries of their choice to discover, visualize and share insights.

Learn more
Runtime

Machine Learning Runtime

One-click access to preconfigured ML-optimized clusters, powered by a scalable and reliable distribution of the most popular ML frameworks (such as PyTorch, TensorFlow and scikit-learn), with built-in optimizations for unmatched performance at scale.

Learn more
Icon Orange

Feature Store

Facilitate the reuse of features with a data lineage–based feature search that leverages automatically logged data sources. Make features available for training and serving with simplified model deployment that doesn’t require changes to the client application.

Learn more
Icon Orange

AutoML

Empower everyone from ML experts to citizen data scientists with a “glass box” approach to AutoML that delivers not only the highest performing model, but also generates code for further refinement by experts.

Learn more
Reliable Data Lakes

Managed MLflow

Built on top of MLflow — the world’s leading open source platform for the ML lifecycle — Managed MLflow helps ML models quickly move from experimentation to production, with enterprise security, reliability and scale.

Learn more
Production Ready

Production-Grade Model Serving

Serve models at any scale with one-click simplicity, with the option to leverage serverless compute.

Learn more
Icon Orange

Model Monitoring

Monitor model performance and how it affects business metrics in real time. Databricks delivers end-to-end visibility and lineage from models in production back to source data systems, helping analyze model and data quality across the full ML lifecycle and pinpoint issues before they have damaging impact.

Learn more
Icon Orange Automation Orchestration

Repos

Repos allows engineers to follow Git workflows in Databricks, enabling data teams to leverage automated CI/CD workflows and code portability.

Learn more
icon machine learning

Large Language Models

Databricks makes it simple to access LLMs and integrate them into your workflows and provides platform capabilities for fine-tuning LLMs using your own data, resulting in better domain performance.

Learn more

Migrate to Databricks

Tired of the data silos, slow performance and high costs associated with legacy systems like Hadoop and enterprise data warehouses? Migrate to the Databricks Lakehouse: the modern platform for all your data, analytics and AI use cases.

Migrate to Databricks

Resources

Ready to get started?