Introducing Meta’s Llama 4 on the Databricks Data Intelligence Platform

Combine open source Llama 4 with your enterprise data to build domain-specific AI agents

Published: April 5, 2025

by Ahmed Bilal, Ankit Mathur, Cade Daniel, Prithu Dasgupta, Wendy Hu, Megha Agarwal, Chenyang Yu and Hanlin Tang

Summary

Meet Llama 4: Llama 4 Maverick is now available on Databricks across AWS, Azure, and GCP.
AI with Your Data: Develop and deploy fast, cost-effective, domain-specific agents, copilots, and RAG pipelines that use your data with Mosaic AI.
Trust Your Agents: Built-in governance, including built-in logging, rate limiting, PII detection, and policy guardrails, helps to ensure safe use in production.

Thousands of enterprises already use Llama models on the Databricks Data Intelligence Platform to power AI applications, agents, and workflows. Today, we’re excited to partner with Meta to bring you their latest model series—Llama 4—available today in many Databricks workspaces and rolling out across AWS, Azure, and GCP.

Llama 4 marks a major leap forward in open, multimodal AI—delivering industry-leading performance, higher quality, larger context windows, and improved cost efficiency from the Mixture of Experts (MoE) architecture. All of this is accessible through the same unified REST API, SDK, and SQL interfaces, making it easy to use alongside all your models in a secure, fully governed environment.

Introducing Meta’s Llama 4 on the Databricks Data Intelligence Platform

Llama 4 is higher quality, faster, and more efficient

The Llama 4 models raise the bar for open foundation models—delivering significantly higher quality and faster inference compared to any previous Llama model.

At launch, we’re introducing Llama 4 Maverick, the largest and highest-quality model from today’s release from Meta. Maverick is purpose-built for developers building sophisticated AI products—combining multilingual fluency, precise image understanding, and safe assistant behavior. It enables:

Enterprise agents that reason and respond safely across tools and workflows
Document understanding systems that extract structured data from PDFs, scans, and forms
Multilingual support agents that respond with cultural fluency and high-quality answers
Creative assistants for drafting stories, marketing copy, or personalized content

And you can now build all of this with significantly better performance. Compared to Llama 3.3 (70B), Maverick delivers:

Higher output quality across standard benchmarks
>40% faster inference, thanks to its Mixture of Experts (MoE) architecture, which activates only a subset of model weights per token for smarter, more efficient compute.
Longer context windows (will support up to 1 million tokens), enabling longer conversations, bigger documents, and deeper context.
Support for 12 languages (up from 8 in Llama 3.3)

Coming soon to Databricks is Llama 4 Scout—a compact, best-in-class multimodal model that fuses text, image, and video from the start. With up to 10 million tokens of context, Scout is built for advanced long-form reasoning, summarization, and visual understanding.

Build Domain-Specific AI Agents with Llama 4 and Mosaic AI

Connect Llama 4 to Your Enterprise Data

Connect Llama 4 to your enterprise data using Unity Catalog-governed tools to build context-aware agents. Retrieve unstructured content, call external APIs, or run custom logic to power copilots, RAG pipelines, and workflow automation. Mosaic AI makes it easy to iterate, evaluate, and improve these agents with built-in monitoring and collaboration tools—from prototype to production.

Run Scalable Inference with Your Data Pipelines

Apply Llama 4 at scale—summarizing documents, classifying support tickets, or analyzing thousands of reports—without needing to manage any infrastructure. Batch inference is deeply integrated with Databricks workflows, so you can use SQL or Python in your existing pipeline to run LLMs like Llama 4 directly on governed data with minimal overhead.

Customize for Accuracy and Alignment

Customize Llama 4 to better fit your use case—whether it’s summarization, assistant behavior, or brand tone. Use labeled datasets or adapt models using techniques like Test-Time Adaptive Optimization (TAO) for faster iteration without annotation overhead. Reach out to your Databricks account team for early access.

Govern AI Usage with Mosaic AI Gateway

Ensure safe, compliant model usage with Mosaic AI Gateway, which adds built-in logging, rate limiting, PII detection, and policy guardrails—so teams can scale Llama 4 securely like any other model on Databricks.

What’s Coming Next

We’re launching Llama 4 in phases, starting with Maverick on Azure, AWS, and GCP. Coming soon:

Llama 4 Scout – Ideal for long-context reasoning with up to 10M tokens
Higher scale Batch Inference – Run batch jobs today, with higher throughput support coming soon
Multimodal Support – Native vision capabilities are on the way

As we expand support, you'll be able to pick the best Llama model for your workload—whether it's ultra-long context, high-throughput jobs, or unified text-and-vision understanding.

Get Ready for Llama 4 on Databricks

Llama 4 will be rolling out to your Databricks workspaces over the next few days.

Use Llama 4 in the Mosaic AI Playground
Learn more about Foundation Models on Databricks
See docs for regional availability: AWS, Azure, Google.
Check out the Compact Guide to AI Agents

What's next?

November 20, 2024/4 min read

Introducing Predictive Optimization for Statistics

November 21, 2024/3 min read