Session

The 52x Multiplier: How Zepto Mastered AI Agent Evaluation at Scale

Overview

Experience	In Person
Track	Artificial Intelligence & Agents
Industry	Retail & Consumer Goods
Technologies	Unity Catalog, Databricks Agents
Skill Level	Intermediate

In quick commerce, delivery windows are measured in minutes, making AI reliability existential. At Zepto, scaling to 80,000 daily autonomous support tickets demanded more than “vibe-based” testing—it required treating evaluation as core infrastructure. This session unveils the comprehensive dual-loop framework (powered by MLFlow 3.0 and DSPy) that transforms operational risk into a competitive advantage.

We achieved a verified 52X ROI on evaluation infrastructure, slashing support costs by 65% while boosting CSAT by 20.5%. By shifting from reactive fixes to real-time tracing, we compressed issue detection from 3 hours to 5 minutes and reduced social media escalations by 74%. We will deconstruct the technical blueprint for establishing “Evaluation as Infrastructure”—covering LLM-as-a-Judge calibration, strategic sampling, and automated feedback loops—enabling teams to deploy complex, multimodal AI agents with absolute confidence and control.

The 52x Multiplier: How Zepto Mastered AI Agent Evaluation at Scale

Overview

Session Speakers

Deepak Dhankani

Gireesh Sreedhar K P