SESSION

Model Alignment at Scale using RL from AI Feedback on Databricks

OVERVIEW

EXPERIENCEIn Person
TYPEBreakout
TRACKGenerative AI
INDUSTRYMedia and Entertainment, Retail and CPG - Food, Financial Services
TECHNOLOGIESAI/Machine Learning, GenAI/LLMs, MLFlow
SKILL LEVELAdvanced
DURATION40 min

Refining large language models to meet specific business objectives can be challenging. Traditional techniques such as on-the-fly tuning and supervised fine-tuning often fail to adapt LLMs to unique requirements, such as adherence to a strict code of conduct or serving niche markets. To address this, we'll show how Reinforcement Learning from AI Feedback (RLAIF) can be applied on Databricks using an open LLM as a reward model, minimizing the need for extensive human intervention in the ranking of outputs. In our session, we'll explore the structure of RLAIF, its practical use, and its advantages over traditional RLHF, including cost efficiency and operational simplicity. We'll back up our discussion with a demo showing how RLAIF effectively aligns LLMs with business-specific requirements in a simple use case. We'll conclude the session by summarizing the key takeaways and offering a perspective on the future of model alignment at scale.

SESSION SPEAKERS

Michael Shtelma

/Lead Specialist Solutions Architect
Databricks