Closing the Agent Quality Loop Through Observability in Production
Overview
| Experience | In Person |
|---|---|
| Track | Artificial Intelligence & Agents |
| Industry | Enterprise Technology, Healthcare & Life Sciences, Financial Services |
| Technologies | Databricks Agents |
| Skill Level | Advanced |
You built an agent. It passed dev testing and shipped. But production is a different world, and now the real work begins: proving it stays high quality and making it better with every release. This session walks the agent quality loop end to end on Databricks with MLFlow, from the seat of the AI Ops engineer supporting a production agent. We start with tracing and observability: real-time trace logging into Unity Catalog, out-of-the-box LLM judges, and a review app for SME labeling. We turn that human feedback into trustworthy, scalable evaluation by aligning LLM judges to users' corrections. We surface problems through automated issue detection that opens trackable bugs, then feed production traffic into larger, more targeted evaluation sets as a CI gate, and ship repeatably with Declarative Automation Bundles. In the end, we are left with an Agent flywheel that de-risks the upcoming releases and discovers problems in existing ones. Through this session, we demonstrate this whole loop live on a real composite agent.
Session Speakers
Arthur Dooner
/Sr. Specialist Solutions Architect
Databricks
Euirim Choi
/Software Engineer
Databricks