SESSION

Methods for Evaluating Your GenAI Application Quality

Accept Cookies to Play Video

OVERVIEW

EXPERIENCEIn Person
TYPEBreakout
TRACKGenerative AI
TECHNOLOGIESDatabricks Experience (DBX), AI/Machine Learning, GenAI/LLMs, MLFlow
SKILL LEVELIntermediate
DURATION40 min
DOWNLOAD SESSION SLIDES

Ensuring the quality and reliability of Generative AI applications in production is paramount. This session dives into the comprehensive suite of tools provided by Databricks, including inference tables, Lakehouse Monitoring, and MLflow to facilitate rigorous evaluation and quality assurance of model responses. Discover how to harness these components effectively to conduct both offline evaluations and real-time monitoring, ensuring your GenAI applications meet the highest standards of performance and reliability.

 

We'll explore best practices for using LLMs as judges to assess response quality, integrating MLflow for tracking experiments and model versions, and leveraging the unique capabilities of inference tables and Lilac for enhanced model management and evaluation. You'll learn how to optimize your workflow and also ensure your GenAI applications are robust, scalable, and aligned with your production goals.

SESSION SPEAKERS

Alkis Polyzotis

/Senior Staff Software Engineer
Databricks

Michael Carbin

/Principal Researcher
Databricks