March 19, 2026

Introducing AI Runtime: Scalable, serverless NVIDIA GPUs on Databricks for training and finetuning

Train the latest LLMs with instantly-available NVIDIA H100 GPUs connected to your Lakehouse

by Tejas Sundaresan, Jianwei Xie, Bandish Shah and Hanlin Tang

With AI Runtime, Databricks now supports NVIDIA GPUs in Serverless Compute, enabling on-demand access to scalable NVIDIA A10 and H100s without infrastructure overhead.
Train computer vision models, LLMs, deep learning-based recommendation systems, and other models with our dedicated runtime for distributed training – all batteries installed.
AI Runtime is integrated with high-speed dataloading from Lakehouse data, workflow orchestration with Lakeflow, and governance with Unity Catalog.

GPUs power today’s most advanced AI workloads—from forecasting and recommendations to multimodal foundation models. However, teams struggle with procuring and managing GPU infrastructure, configuring distributed training environments, and debugging data loading bottlenecks. Deep learning researchers prefer to focus on the modeling, not troubleshooting infrastructure.

We’re excited to announce the Public Preview of AI Runtime (AIR), a new training stack that enables on-demand distributed GPU training on A10s and H100s. AI Runtime contains all the technology used for large scale training of LLMs such as MPT and DBRX. Even in Beta, several hundreds of customers, including Rivian, Factset, and YipitData have used AIR to train and ship deep learning models into production. Use cases span the gamut from computer vision models to recommendation systems to finetuned LLMs for agentic tasks. Our own Databricks AI Research team used AIR for reinforcement learning of models such as in our recent KARL paper.

With AI Runtime, Databricks users now have:

Serverless, on-demand NVIDIA GPUs: Simply configure your notebook in 2-3 clicks, and get fast attach to Serverless A10 and H100 GPUs to start training – no cluster needed. Only pay for the GPUs that you use, without worrying about idle time utilization.
Robust orchestration tools: Use the full power of Databricks’ orchestration suite with Lakeflow Jobs and DABs support for long-running GPU workloads
Optimized distributed training: AIR bundles distributed GPU performance enhancements, like RDMA and high-performance data loading
Centralized governance and observability: run, observe, and govern GPU workloads exactly where your data resides, with built in experiment management via MLflow, access management with Unity Catalog, and agent-assisted debugging

On-demand NVIDIA H100 and A10 GPUs in notebooks

AI Runtime

For interactive development and debugging, connect to on-demand A10s and H100s in Databricks Notebooks with just a few clicks. From there, leverage all the developer ergonomics that Databricks is known for, from environment management for common Python packages to agent-powered authoring and debugging with Genie Code. Easily mount data from the Lakehouse to train deep learning models, or even invoke a fleet of remote CPUs for Spark data processing workloads from your GPU-powered notebook to prepare your data.

Genie Code demo

Use Genie Code to help resolve performance bottlenecks, experiment with new architectures, or debug tricky bugs around model convergence or cryptic framework errors.

Lakeflow for production-ready workloads

AI Runtime is a production-grade platform for accelerated computing. Develop your deep learning code in interactive notebooks, and then use the full power of Lakeflow to submit and orchestrate jobs on GPU compute. Both notebooks and custom code repositories can be executed by Lakeflow for long-running or scheduled jobs. For production needs such as CI/CD (continuous integration and continuous deployment), AI Runtime is fully compatible with our Declarative Automation Bundles (DABs).

With our Lakeflow integration, customers can keep model training and fine-tuning tightly synchronized with upstream data pipelines and downstream production systems.

“Databricks' AI Runtime greatly streamlined the process of training a custom Text To Formula (TTF) model. With no infrastructure setup or delays, it was easy to choose the right compute based on prompt size and output token generation. This allowed us to move quickly, maintain our Lakehouse workflows, and deliver a high-quality model with full governance, reducing time to setup, train and deploy our model from days to hours.”— Nikhil Sunderraj Principal Machine Learning Engineer, FactSet Research Systems, Inc.

Runtime optimized for distributed deep learning

Distributed training workloads can be painful to prepare, debug, and observe. From troubleshooting RDMA setups to tracking telemetry from multiple GPUs to proper software configuration, users can easily miss critical details that dramatically slow model training.

Instead, AI Runtime is optimized for the entire deep learning lifecycle—and is designed to save you time. Key dependencies like PyTorch and CUDA come pre-installed, along with optimized support for distributed training frameworks such as Ray, Hugging Face Transformers, Composer, and other libraries, so you can start training immediately without managing environments. Customers are also welcome to bring their own libraries, from Unsloth to TorchRec to custom training loops.

Integrated SDKs and observability tools simplify the management of distributed training workloads. MLFlow enables deep observability of GPU workloads, with automatic tracking of GPU utilization and training experiments. Whether you're fine-tuning foundation models or training forecasting and personalization models, the runtime is optimized to accelerate training workflows with minimal setup.

MLFlow enables deep observability of GPU workloads, with automatic tracking of GPU utilization and training experiments.

Today’s Public Preview of AI Runtime supports distributed training across 8x H100s in a single-node, with multi-node support currently in Private Preview.

"Databricks' AI Runtime enables us to efficiently run LLM workloads (fine tuning and inference) without infrastructure overhead, directly in our lakehouse. This seamless integration simplifies our pipelines and provides efficient use of GPUs, enabling us to deliver high quality AI insights to our customers and focus on innovation, not on infrastructure."— Lucas Froguel, Senior AI Platform Engineer, YipitData

Centralized data governance and observability

AI Runtime integrates natively with the Databricks Lakehouse, enabling you to run and govern GPU workloads where your data resides. This eliminates fragmented workflows and simplifies the path from experimentation to production.

Centralized governance with Unity Catalog: Apply consistent access controls, lineage, and governance policies across both data and AI workloads, enabling secure and compliant use of GPU resources.
Unified observability: Track and monitor all workloads—CPU and GPU—in one place using native system tables for unified auditing, usage tracking, and operational insights.

Your AI workloads run fully within your enterprise data perimeter, delivering strong governance and security without sacrificing flexibility for experimentation and scale.

"Leveraging Databricks' serverless GPU support within our Lakehouse enables us to efficiently train advanced audio and multimodal models without infrastructure overhead. This seamless integration simplifies workflows and provides efficient use of GPU resources, ensuring we deliver high-performance systems and focus on innovation."— Arjuna Siva, VP of Infotainment & Connectivity, Rivian and Volkswagen Group Technologies

Integrating Next-Generation GPU Innovation From NVIDIA

Demand for accelerated compute continues to grow across AI workloads and agentic systems. AI Runtime enables more Databricks customers to leverage NVIDIA hardware to accelerate their AI workloads and drive their business forward. We are excited to continue partnering with NVIDIA to bring the latest NVIDIA technology, like the RTX PRO 4500 Blackwell Server Edition, announced at GTC 2026 to our customers.

"As AI adoption accelerates across industries, organizations need scalable, high-performance infrastructure to power their data and AI workloads. NVIDIA technologies bring accelerated performance to the AI Runtime offering for the Databricks Lakehouse Platform."— Pat Lee, Vice President, Strategic Partnerships at NVIDIA.

Get started today with AI Runtime

To help you get started, we’ve put together several template notebooks and starter guides:

Please see our documentation for detailed instructions on setup and day to day use.
Starter templates for training recommender systems, classic ML models, fine-tuning LLMs and more!
Migration guide from Classic Compute GPU workloads to Serverless.

Please reach out to your account team to learn more or if you have any questions!

Get the latest posts in your inbox

Subscribe to our blog and get the latest posts delivered to your inbox.

View all blogs