Session

High-Throughput, Low-Latency: The Databricks Playbook for Production Model Serving

Overview

Experience	In Person
Track	Artificial Intelligence & Agents
Industry	Enterprise Technology
Technologies	Databricks Agents
Skill Level	Intermediate

Modern AI applications don’t just need accurate models, they need to serve them at massive scale, with low latency and strict reliability guarantees. In this session, we’ll go under the hood of Databricks Model Serving to show how we support high QPS (300K+ QPS), production-grade workloads on a unified Lakehouse platform. We’ll walk through the serving architecture, autoscaling strategies, and optimizations across GPU/CPU utilization, request routing, and caching that enable sustained high throughput without sacrificing latency or cost. You’ll see real patterns from customer deployments, learn how to design your endpoints for bursty and always-on traffic, and leave with practical guidance for running mission-critical ML and LLM workloads on Databricks Model Serving.

Session Speakers

IMAGE COMING SOON

Anshul Gupta

/Sr. Staff Software Engineer
Databricks

IMAGE COMING SOON

Mike Del Balso

/Director of Product Management
Databricks