Session

Sponsored by: Red Hat | Why AI Evaluation Breaks at Scale — and How to Fix It End-to-End

Overview

ExperienceIn Person
TrackArtificial Intelligence & Agents
IndustryEnterprise Technology
TechnologiesAI/BI, Unity Catalog, Agent Bricks
Skill LevelBeginner

Most teams evaluate AI in pieces — one script for agent behavior, another for safety, another for adversarial testing — with no shared results, no audit trail, and no defensible answer to "is this ready to deploy?" The accumulation of disconnected tools isn't a gap you can close by adding another tool. It's an organizational and architectural failure, and it compounds as models and agents move into regulated or high-stakes environments. This session confronts the gap directly. We leverage Evaluation-Driven Development (EDD): a structured methodology that connects behavioral, safety, and adversarial evaluation layers into a single, reproducible workflow — with an audit trail your compliance team can actually use and demonstrate it with open-source tooling that integrates with MLflow. Whether you own the model, the platform, or the risk, this session gives you a shared framework for answering the question that matters: Is this AI-enabled capability ready to be deployed?

Session Speakers

Speaker placeholderIMAGE COMING SOON

Carlos Condado

/Sr. Product Marketing Manager
Red Hat

Speaker placeholderIMAGE COMING SOON