Session

Dr. Jekyll and Mr. H-AI-de: Using MLflow and AI Judges to measure model alignment and safety

Overview

Experience	In Person
Track	Artificial Intelligence & Agents
Industry	Enterprise Technology
Technologies	Unity Catalog, Agent Bricks
Skill Level	Intermediate

Emergent misalignment sent shockwaves through the AI community in 2025. We discovered that a model fine-tuned on narrow tasks could suddenly pivot from a "helpful tool" into a "rogue actor." In early 2026, Nature warned that mistrained models "quickly go off the rails," making AI safety a non-negotiable for enterprise innovation. To build with confidence, we must move beyond blind trust toward empirical control.In this 20-minute lightning talk, we will demystify the process of assessing model integrity. We'll walk through the practical steps to:- Build custom judges designed to detect latent misalignment and goal-drift.- Deploy judges in MLflow to automate the evaluation of frontier open-source models.- Benchmark against emerging standards like HarmBench to quantify operational risk.Learn how to stay in control and inherently understand the internal "compass" of your models. The last thing your enterprise needs is an assistant that turns into a monster the moment you look away.

Session Speakers

IMAGE COMING SOON

Hendrik Frentrup

/Solution Architect
Databricks