Session

Mission-Critical Inference: Powering High-Scale AI in Production

Overview

Experience	In Person
Track	Artificial Intelligence & Agents
Industry	Enterprise Technology
Technologies	Databricks Agents
Skill Level	Advanced

You've shipped your model. Now it needs to serve millions of requests a day without blinking — at the right latency, the right cost, and with zero downtime. The architecture that handled your prototype won't cut it in production, and most teams learn that the hard way.This session walks through the architecture, tuning knobs, and operational patterns that let Databricks Custom Model Serving handle the world's most demanding inference workloads. You'll see how GPU autoscaling matches capacity to traffic, how request batching drives down cost per call, how traffic splitting enables safe rollouts, and how built-in observability lets you catch regressions before users do.Bring your hardest production challenges — you'll leave with concrete patterns for running high-scale AI inference reliably, and the playbook to keep it fast, governed, and cost-efficient as demand grows.

Session Speakers

IMAGE COMING SOON

Ankit Mathur

/Sr. Staff Software Engineer
Databricks

Brian Law

/Sr. Specialist Solutions Architect
Databricks