Sponsored by: Red Hat | Why AI Evaluation Breaks at Scale — and How to Fix It End-to-End
Overview
| Experience | In Person |
|---|---|
| Track | Artificial Intelligence & Agents |
| Industry | Enterprise Technology |
| Technologies | AI/BI, Unity Catalog, Agent Bricks |
| Skill Level | Beginner |
Most teams evaluate AI in pieces — one script for agent behavior, another for safety, another for adversarial testing — with no shared results, no audit trail, and no defensible answer to "is this ready to deploy?" The accumulation of disconnected tools isn't a gap you can close by adding another tool. It's an organizational and architectural failure, and it compounds as models and agents move into regulated or high-stakes environments. This session confronts the gap directly. We leverage Evaluation-Driven Development (EDD): a structured methodology that connects behavioral, safety, and adversarial evaluation layers into a single, reproducible workflow — with an audit trail your compliance team can actually use and demonstrate it with open-source tooling that integrates with MLflow. Whether you own the model, the platform, or the risk, this session gives you a shared framework for answering the question that matters: Is this AI-enabled capability ready to be deployed?
Session Speakers
Carlos Condado
/Sr. Product Marketing Manager
Red Hat