Session
Dr. Jekyll and Mr. H-AI-de: Using MLflow and AI Judges to measure model alignment and safety
Overview
| Experience | In Person |
|---|---|
| Track | Artificial Intelligence & Agents |
| Industry | Enterprise Technology |
| Technologies | Unity Catalog, Agent Bricks |
| Skill Level | Intermediate |
Emergent misalignment sent shockwaves through the AI community in 2025. We discovered that a model fine-tuned on narrow tasks could suddenly pivot from a "helpful tool" into a "rogue actor." In early 2026, Nature warned that mistrained models "quickly go off the rails," making AI safety a non-negotiable for enterprise innovation. To build with confidence, we must move beyond blind trust toward empirical control.In this 20-minute lightning talk, we will demystify the process of assessing model integrity. We'll walk through the practical steps to:- Build custom judges designed to detect latent misalignment and goal-drift.- Deploy judges in MLflow to automate the evaluation of frontier open-source models.- Benchmark against emerging standards like HarmBench to quantify operational risk.Learn how to stay in control and inherently understand the internal "compass" of your models. The last thing your enterprise needs is an assistant that turns into a monster the moment you look away.
Session Speakers
Hendrik Frentrup
/Solution Architect
Databricks