Session

The 52x Multiplier: How Zepto Mastered AI Agent Evaluation at Scale

Overview

ExperienceIn Person
TrackArtificial Intelligence & Agents
IndustryRetail & Consumer Goods
TechnologiesUnity Catalog, Agent Bricks
Skill LevelIntermediate

In quick commerce, delivery windows are measured in minutes, making AI reliability existential. At Zepto, scaling to 80,000 daily autonomous support tickets demanded more than “vibe-based” testing—it required treating evaluation as core infrastructure. This session unveils the comprehensive dual-loop framework (powered by MLFlow 3.0 and DSPy) that transforms operational risk into a competitive advantage.

We achieved a verified 52X ROI on evaluation infrastructure, slashing support costs by 65% while boosting CSAT by 20.5%. By shifting from reactive fixes to real-time tracing, we compressed issue detection from 3 hours to 5 minutes and reduced social media escalations by 74%. We will deconstruct the technical blueprint for establishing “Evaluation as Infrastructure”—covering LLM-as-a-Judge calibration, strategic sampling, and automated feedback loops—enabling teams to deploy complex, multimodal AI agents with absolute confidence and control. 

Session Speakers

Gireesh Sreedhar K P

/Sr Delivery Solution Architect
Databricks

Deepak Dhankani

/Associate Director - Data Science
Zepto