Session
Mission-Critical Inference: Powering High-Scale AI in Production
Overview
| Experience | In Person |
|---|---|
| Track | Artificial Intelligence & Agents |
| Industry | Enterprise Technology |
| Technologies | Agent Bricks |
| Skill Level | Advanced |
You've shipped your model. Now it needs to serve millions of requests a day without blinking — at the right latency, the right cost, and with zero downtime. The architecture that handled your prototype won't cut it in production, and most teams learn that the hard way.This session walks through the architecture, tuning knobs, and operational patterns that let Databricks Custom Model Serving handle the world's most demanding inference workloads. You'll see how GPU autoscaling matches capacity to traffic, how request batching drives down cost per call, how traffic splitting enables safe rollouts, and how built-in observability lets you catch regressions before users do.Bring your hardest production challenges — you'll leave with concrete patterns for running high-scale AI inference reliably, and the playbook to keep it fast, governed, and cost-efficient as demand grows.
Session Speakers
Ankit Mathur
/Sr. Staff Software Engineer
Databricks
Tejas Sundaresan
/Sr. Product Manager
Databricks