Session
AI Evaluation from First Principles: You Can't Manage What You Can't Measure
Overview
Experience | In Person |
---|---|
Type | Breakout |
Track | Artificial Intelligence |
Industry | Enterprise Technology |
Technologies | MLFlow, Mosaic AI |
Skill Level | Intermediate |
Duration | 40 min |
Is your AI evaluation process holding back your system's true potential? Many organizations struggle with improving GenAI quality because they don't know how to measure it effectively. This research session covers the principles of GenAI evaluation, offers a framework for measuring what truly matters, and demonstrates implementation using Databricks.Key Takeaways:-Practical approaches for establishing reliable metrics for subjective evaluations-Techniques for calibrating LLM judges to enable cost-effective, scalable assessment-Actionable frameworks for evaluation systems that evolve with your AI capabilitiesWhether you're developing models, implementing AI solutions, or leading technical teams, this session will equip you to define meaningful quality metrics for your specific use cases and build evaluation systems that expose what's working and what isn't, transforming AI guesswork into measurable success.
Session Speakers
IMAGE COMING SOON
Jonathan Frankle
/Chief Scientist - Neural Networks
Databricks
IMAGE COMING SOON
Pallavi Koppol
/Research Scientist
Databricks