Session

Behind the Curtain: How We Do Eval in Genie

Overview

ExperienceIn Person
TrackAnalytics & BI
IndustryEnterprise Technology
TechnologiesAI/BI
Skill LevelAdvanced
Evaluating AI systems is notoriously challenging—especially when correctness isn’t binary. In this talk, we’ll walk through how the Genie engineering team approaches evaluation at scale, from defining what “good” looks like to building reliable, automated eval pipelines. We’ll cover our mix of offline benchmarks, human-in-the-loop validation, and production feedback loops, along with the tradeoffs we’ve encountered. The goal is to share practical patterns and lessons learned that can help teams move faster while maintaining trust in their models.

Session Speakers

Speaker placeholderIMAGE COMING SOON

Shanshan Zheng

/Sr Manager, Engineering
Databricks