Session

Closing the Feedback Loop: Self-Evolving Agent Test Harness with MLflow

Overview

ExperienceIn Person
TrackArtificial Intelligence & Agents
IndustryEnterprise Technology
TechnologiesAgent Bricks
Skill LevelIntermediate
"Most teams iterate on agent quality through vibe-checking: try a question, read the answer, fix what looks wrong, repeat. It works initially, but breaks down as the project grows. Each fix can revive an old bug or introduce a new class of issue. Coding agents add to the strain. They rewrite the agent in seconds while verification still happens by hand.Offline evaluation is one answer, but it is hard. Traditional eval requires setting up golden datasets, metrics, and judges b efore it pays off, and many teams don't get past the setup cost.This session introduces a new approach in MLflow: a self-evolving test harness for agent quality that builds itself from inside the vibe-checking loop. Each piece of feedback on a bad answer becomes an automated test, and every coding-agent fix runs against the accumulated suite.We'll walk through a live demo and share what we've learned."

Session Speakers

Speaker placeholderIMAGE COMING SOON

Yuki Watanabe

/Senior Software Engineer
Databricks

Speaker placeholderIMAGE COMING SOON

Daniel Lok

/Software Engineer
Databricks