Session

Closing the Agent Quality Loop Through Observability in Production

Overview

ExperienceIn Person
TrackArtificial Intelligence & Agents
IndustryEnterprise Technology, Healthcare & Life Sciences, Financial Services
TechnologiesDatabricks Agents
Skill LevelAdvanced

You built an agent. It passed dev testing and shipped. But production is a different world, and now the real work begins: proving it stays high quality and making it better with every release. This session walks the agent quality loop end to end on Databricks with MLFlow, from the seat of the AI Ops engineer supporting a production agent. We start with tracing and observability: real-time trace logging into Unity Catalog, out-of-the-box LLM judges, and a review app for SME labeling. We turn that human feedback into trustworthy, scalable evaluation by aligning LLM judges to users' corrections. We surface problems through automated issue detection that opens trackable bugs, then feed production traffic into larger,  more targeted evaluation sets as a CI gate, and ship repeatably with Declarative Automation Bundles. In the end, we are left with an Agent flywheel that de-risks the upcoming releases and discovers problems in existing ones. Through this session, we demonstrate this whole loop live on a real composite agent.

Session Speakers

Speaker placeholderIMAGE COMING SOON

Arthur Dooner

/Sr. Specialist Solutions Architect
Databricks

Euirim Choi

/Software Engineer
Databricks