Session

AI Evaluation from First Principles: You Can't Manage What You Can't Measure

Overview

Thursday

June 12

1:40 pm

Experience	In Person
Type	Breakout
Track	Artificial Intelligence
Industry	Enterprise Technology
Technologies	MLFlow, Mosaic AI
Skill Level	Intermediate
Duration	40 min

Is your AI evaluation process holding back your system's true potential? Many organizations struggle with improving GenAI quality because they don't know how to measure it effectively. This research session covers the principles of GenAI evaluation, offers a framework for measuring what truly matters, and demonstrates implementation using Databricks.Key Takeaways:-Practical approaches for establishing reliable metrics for subjective evaluations-Techniques for calibrating LLM judges to enable cost-effective, scalable assessment-Actionable frameworks for evaluation systems that evolve with your AI capabilitiesWhether you're developing models, implementing AI solutions, or leading technical teams, this session will equip you to define meaningful quality metrics for your specific use cases and build evaluation systems that expose what's working and what isn't, transforming AI guesswork into measurable success.

Session Speakers

IMAGE COMING SOON

Jonathan Frankle

/Chief Scientist - Neural Networks
Databricks

Pallavi Koppol

/Research Scientist
Databricks