Session

AI Evaluation from First Principles: You Can't Manage What You Can't Measure

Overview

ExperienceIn Person
TypeBreakout
TrackArtificial Intelligence
IndustryEnterprise Technology
TechnologiesMLFlow, Mosaic AI
Skill LevelIntermediate
Duration40 min
Is your AI evaluation process holding back your system's true potential? Many organizations struggle with improving GenAI quality because they don't know how to measure it effectively. This research session covers the principles of GenAI evaluation, offers a framework for measuring what truly matters, and demonstrates implementation using Databricks.Key Takeaways:-Practical approaches for establishing reliable metrics for subjective evaluations-Techniques for calibrating LLM judges to enable cost-effective, scalable assessment-Actionable frameworks for evaluation systems that evolve with your AI capabilitiesWhether you're developing models, implementing AI solutions, or leading technical teams, this session will equip you to define meaningful quality metrics for your specific use cases and build evaluation systems that expose what's working and what isn't, transforming AI guesswork into measurable success.

Session Speakers

IMAGE COMING SOON

Jonathan Frankle

/Chief Scientist - Neural Networks
Databricks

IMAGE COMING SOON

Pallavi Koppol

/Research Scientist
Databricks