entity.node.canonical

Agenda

Speakers

<nolink>

Event Info

EVENT ARCHIVE

Event Terms

Code of Conduct

Privacy Notice

Terms of Use

Modern Slavery Statement

California Privacy

Onsite experience

Pricing

Travel

Virtual Experience

Event Archive

Experience

Consumer Industries

Cybersecurity

Energy and Utilities

Financial Services

Healthcare and Life Sciences

Manufacturing and Transportation

Marketing

Media and Entertainment

Public Sector

Startups

Tech and AI

Telecommunications

<none>

Industries & Solutions

Behind the Curtain: How We Do Eval in Genie 

Evaluating AI systems is notoriously challenging—especially when correctness isn’t binary. In this talk, we’ll walk through how the Genie engineering team approaches evaluation at scale, from defining what “good” looks like to building reliable, automated eval pipelines. We’ll cover our mix of offline benchmarks, human-in-the-loop validation, and production feedback loops, along with the tradeoffs we’ve encountered. The goal is to share practical patterns and lessons learned that can help teams move faster while maintaining trust in their models.

Sr Manager, Engineering

Shanshan Zheng

Experience	In Person
Track	Analytics & BI
Industry	Enterprise Technology
Technologies	AI/BI
Skill Level	Advanced

Behind the Curtain: How We Do Eval in Genie

Overview

Session Speakers

Shanshan Zheng