Sponsored by: Galileo Technologies Inc. | Taming Rogue AI Agents with Observability-Driven Evaluation
Overview
Experience | In Person |
---|---|
Type | Breakout |
Track | Artificial Intelligence |
Industry | Enterprise Technology, Media and Entertainment, Financial Services |
Technologies | AI/BI |
Skill Level | Intermediate |
Duration | 40 min |
LLM agents often drift into failure when prompts, retrieval, external data, and policies interact in unpredictable ways. This technical session introduces a repeatable, metric-driven framework for detecting, diagnosing, and correcting these undesirable behaviors in agentic systems at production scale. We demonstrate how to instrument the agent loop with fine-grained signals—tool-selection quality, error rates, action progression, latency, and domain-specific metrics—and send them into an evaluation layer (e.g. Galileo). This telemetry enables a virtuous cycle of system improvement. We present a practical example of a stock-trading system and show how brittle retrieval and faulty business logic cause undesirable behavior. We refactor prompts, adjust the retrieval pipeline—verifying recovery through improved metrics. Attendees will learn how to: add observability with minimal code change, pinpoint root causes via tracing, and drive continuous, metric-validated improvement.