Session

Mission-Critical Inference: Powering High-Scale AI in Production

Overview

ExperienceIn Person
TrackArtificial Intelligence & Agents
IndustryEnterprise Technology
TechnologiesAgent Bricks
Skill LevelAdvanced
You've shipped your model. Now it needs to serve millions of requests a day without blinking — at the right latency, the right cost, and with zero downtime. The architecture that handled your prototype won't cut it in production, and most teams learn that the hard way.This session walks through the architecture, tuning knobs, and operational patterns that let Databricks Custom Model Serving handle the world's most demanding inference workloads. You'll see how GPU autoscaling matches capacity to traffic, how request batching drives down cost per call, how traffic splitting enables safe rollouts, and how built-in observability lets you catch regressions before users do.Bring your hardest production challenges — you'll leave with concrete patterns for running high-scale AI inference reliably, and the playbook to keep it fast, governed, and cost-efficient as demand grows.

Session Speakers

Speaker placeholderIMAGE COMING SOON

Ankit Mathur

/Sr. Staff Software Engineer
Databricks

Speaker placeholderIMAGE COMING SOON

Tejas Sundaresan

/Sr. Product Manager
Databricks