Session

Sponsored by: Red Hat | Why AI Evaluation Breaks at Scale — and How to Fix It End-to-End

Overview

Experience	In Person
Track	Artificial Intelligence & Agents
Industry	Enterprise Technology
Technologies	AI/BI, Unity Catalog, Databricks Agents
Skill Level	Beginner

Most teams evaluate AI in pieces — one script for agent behavior, another for safety, another for adversarial testing — with no shared results, no audit trail, and no defensible answer to "is this ready to deploy?" The accumulation of disconnected tools isn't a gap you can close by adding another tool. It's an organizational and architectural failure, and it compounds as models and agents move into regulated or high-stakes environments. This session confronts the gap directly. We leverage Evaluation-Driven Development (EDD): a structured methodology that connects behavioral, safety, and adversarial evaluation layers into a single, reproducible workflow — with an audit trail your compliance team can actually use and demonstrate it with open-source tooling that integrates with MLflow. Whether you own the model, the platform, or the risk, this session gives you a shared framework for answering the question that matters: Is this AI-enabled capability ready to be deployed?

Session Speakers

Carlos Condado

/Sr. Product Marketing Manager
Red Hat

IMAGE COMING SOON

Sponsored by: Red Hat | Why AI Evaluation Breaks at Scale — and How to Fix It End-to-End

Overview

Session Speakers

Carlos Condado

Frank La Vigne