Session
Building Production-Ready Agents with a Self-Evolving Test Suite

Overview
| Experience | In Person |
|---|---|
| Track | Artificial Intelligence & Agents |
| Industry | Enterprise Technology |
| Technologies | Databricks Agents |
| Skill Level | Intermediate |
Building agents is challenging. Most teams iterate on agent quality by vibe-checking. Try a question, fix what looks wrong, ship. It works at first and breaks down fast. Each fix can revive an old bug. Coding agents make this worse by rewriting the agent in seconds while verification stays manual.This session introduces a new best practice. With one MLflow command, your coding agent turns into a rigorous builder that grows a test suite from inside the vibe-check loop. Every piece of feedback on a bad answer becomes an automated test. Every coding-agent fix runs against the accumulated suite, scored and recorded in MLflow. By the time your agent is ready to ship, you have a real test suite that grew alongside the build.We walk through this on stage end to end. We start with a prototype agent that has no tracing and no eval, and we end with a production-ready agent backed by a test suite that grew alongside the build.
Session Speakers
Adam Gurary
/Senior Product Manager
Databricks
Yuki Watanabe
/Senior Software Engineer
Databricks