SESSION

AutoFeedback: Scaling Human Feedback with Custom Evaluation Models

OVERVIEW

EXPERIENCEIn Person
TYPEBreakout
TRACKData Science and Machine Learning
INDUSTRYEnterprise Technology
TECHNOLOGIESAI/Machine Learning, GenAI/LLMs
SKILL LEVELIntermediate
DURATION40 min

Human feedback plays a crucial role in evaluating the output of LLM applications. However, relying solely on human review can be time-consuming and costly. To address this, we have developed an AutoFeedback system combining human and model-based evaluation strengths. We will discuss how our custom evaluation models, built using in-context learning and fine-tuning techniques, can significantly improve the efficiency and accuracy of LLM evaluation. By training these models with human feedback data, we have achieved a 44% reduction in absolute error on a 7-point grading task. Additionally, our evaluation models are capable of generating explanations for their grades, enhancing transparency and interpretability. Our synthetic bootstrapping procedure allows us to fine-tune models with as few as 25-50 human-labeled examples. The model-generated feedback approaches the accuracy of models trained on larger datasets while reducing costs by 10x+ compared to human annotations.

SESSION SPEAKERS

Arjun Bansal

/CEO & Co-founder
Log10