The 52x Multiplier: How Zepto Mastered AI Agent Evaluation at Scale
Overview
| Experience | In Person |
|---|---|
| Track | Artificial Intelligence & Agents |
| Industry | Retail & Consumer Goods |
| Technologies | Unity Catalog, Agent Bricks |
| Skill Level | Intermediate |
In quick commerce, delivery windows are measured in minutes, making AI reliability existential. At Zepto, scaling to 80,000 daily autonomous support tickets demanded more than “vibe-based” testing—it required treating evaluation as core infrastructure. This session unveils the comprehensive dual-loop framework (powered by MLFlow 3.0 and DSPy) that transforms operational risk into a competitive advantage.
We achieved a verified 52X ROI on evaluation infrastructure, slashing support costs by 65% while boosting CSAT by 20.5%. By shifting from reactive fixes to real-time tracing, we compressed issue detection from 3 hours to 5 minutes and reduced social media escalations by 74%. We will deconstruct the technical blueprint for establishing “Evaluation as Infrastructure”—covering LLM-as-a-Judge calibration, strategic sampling, and automated feedback loops—enabling teams to deploy complex, multimodal AI agents with absolute confidence and control.
Session Speakers
Gireesh Sreedhar K P
/Sr Delivery Solution Architect
Databricks
Deepak Dhankani
/Associate Director - Data Science
Zepto