Mosaic AI Announcements at Data + AI Summit 2025

Published: June 11, 2025

by Hanlin Tang, Akhil Gupta, Patrick Wendell and Naveen Rao

Summary

Agent Bricks: Build high-quality, domain-specific agents by describing the task—Agent Bricks auto-generates evaluations and optimizes for quality and cost.
MLflow 3.0: Redesigned for GenAI with agent observability, prompt versioning, and cross-platform monitoring—even for agents running outside Databricks.
Serverless GPU Compute: Run training and inference without managing infrastructure—fully managed, auto-scaling GPUs now available in beta.

Last year, we unveiled data intelligence – AI that can reason on your enterprise data – with the arrival of the Databricks Mosaic AI stack for building and deploying agent systems. Since then, we’ve had thousands of customers bring AI into production. This year at the Data and AI Summit, we are excited to announce several key products:

Agent Bricks in Beta

Agent Bricks is a new way to build high-quality agents that are auto-optimized on your data. Just provide a high-level description of the agent’s task and connect your enterprise data — Agent Bricks handles the rest. Agent Bricks is optimized for common industry use cases, including structured information extraction, reliable knowledge assistance, custom text transformation, and building multi-agent systems. We use the latest in agentic research from the Databricks Mosaic AI research team to automatically build evaluations and optimize agent quality. For more details, see the Agent Bricks deep dive blog.

MLflow 3.0

We are releasing MLflow 3, which was redesigned from the ground up for Generative AI, with the latest in monitoring, evaluation, and lifecycle management. Now with MLflow 3, you can monitor and observe agents that are deployed anywhere, even outside of Databricks. Agents deployed on AWS, GCP, or even on-premise systems can now be connected to MLflow 3 for agent observability.

We have also included in MLflow 3 a prompt registry, allowing you to register, version, test, and deploy different LLM prompts for your agent systems.

AI Functions in SQL: Now Faster and Multi-Modal

AI Functions enable users to easily access the power of generative AI directly from within SQL. This year, we are excited to share that AI Functions now have dramatic performance improvements and expanded multi-modal capabilities. AI Functions are now up to 3x faster and 4x lower cost than other vendors on large-scale workloads, enabling you to process large-scale data transformations with unprecedented speed.

Beyond performance, AI Functions now support multi-modal capabilities, allowing you to work seamlessly across text, images, and other data types. New functions like ai_parse_document make it effortless to extract structured information from complex documents, unlocking insights from previously hard-to-process enterprise content.

AI Functions is now 3x faster — Figure 4: AI Functions in SQL is now more than 3x faster than the competition on scaled workloads

Storage-Optimized Vector Search in Public Preview

Mosaic AI Vector Search forms the backbone of many retrieval systems, and especially RAG agents, and our Vector Search product is one of the fastest growing products at Databricks. We’ve now completely re-wrote the infrastructure from scratch with the principles of separating compute and storage. Our new Storage-Optimized Vector Search can scale up billions of vectors while delivering 7x lower cost. This breakthrough makes it economically feasible to build sophisticated RAG applications and semantic search systems across your entire data estate. Whether you're powering customer support chatbots or enabling advanced document discovery, you can now scale without the prohibitive costs. See our detailed blog post for technical deep-dive and performance benchmarks.

Serverless GPU Compute in Beta

We are announcing a major step forward in serverless compute with the introduction of GPU support in Databricks serverless platform. GPU-powered AI workloads are now more accessible than ever, with this fully managed service eliminating the complexity of GPU management. Whether you're training models, running inference, or processing large-scale data transformations, Serverless GPU compute provides the performance you need without the operational overhead. Fully integrated into the Databricks platform, Serverless GPU compute enables on-demand access to A10g (Beta today) and H100s (coming soon), without being locked into long-term reservations. Run notebooks on serverless GPUs and submit as jobs, with the full governance of Unity Catalog.

High-Scale Model Serving

Enterprise AI applications today demand increased throughput and lower latencies for production readiness. Our enhanced Model Serving infrastructure now supports over 250,000 queries per second (QPS). Bring your real-time online ML workloads to Databricks, and let us handle the infrastructure and reliability challenges so you can focus on the AI model development.

With LLM serving, we’ve now launched a new proprietary in-house inference engine in all regions. The inference engine contains many of our private innovations and custom kernels to accelerate inference of Meta Llama and other open-source LLMs. On common workloads, our inference engine is up to 1.5x faster than properly configured open source engines such as vLLM-v1. Together with the rest of our LLM serving infrastructure, these innovations mean that serving LLMs on Databricks is easier, faster, and often lower total cost, than DIY serving solutions.

From chatbots to recommendation engines, your AI services can now scale to handle even the most demanding enterprise workloads.

MCP Support in Databricks

Anthropic’s Model Context Protocol (MCP) is a popular protocol for providing tools and knowledge to large language models. We’ve now integrated MCP directly into the Databricks platform. MCP servers can be hosted with Databricks Apps, giving a seamless way to deploy and manage MCP-compliant services without additional infrastructure management. You can interact with and test MCP-enabled models directly in our Playground environment, making it easier to experiment with different model configurations and capabilities.

Additionally, you can now connect your agents to leverage Databricks with the launch of Databricks-hosted MCP servers for UC functions, Genie, and Vector Search. To learn more, see our documentation.

AI Gateway is Generally Available

Mosaic AI Gateway is now generally available. This unified entry point for all your AI services provides centralized governance, usage logging, and control across your entire AI application portfolio. We’ve also added a host of new capabilities, from being able to automatically fallback between different providers, to PII and safety guardrails. With AI Gateway, you can implement rate limit policies, track usage, and enforce safety guardrails, on AI workloads, whether they're running on Databricks or through external services.

Get Started

These announcements represent our continued commitment to making enterprise AI more accessible, performant, and cost-effective. Each innovation builds upon our data intelligence platform, ensuring that your AI applications can leverage the full power of your enterprise data while maintaining the governance and security standards your organization requires.

Ready to explore these new capabilities? Start with our free tier or reach out to your Databricks representative to learn how these innovations can accelerate your AI initiatives.

What's next?

November 20, 2024/4 min read

Introducing Predictive Optimization for Statistics

November 21, 2024/3 min read