SESSION

LLM Evaluation: Auditing Fine-Tuned LLMs for Guaranteed Output Quality

OVERVIEW

EXPERIENCEIn Person
TYPEBreakout
TRACKGenerative AI
INDUSTRYEnterprise Technology
TECHNOLOGIESAI/Machine Learning, GenAI/LLMs, MLFlow
SKILL LEVELIntermediate
DURATION40 min

Information retrieval from E-commerce product data sheets is a complex challenge and can incur high costs if done manually. To perform this task, Mirakl developed an innovative solution that leverages the power of fine-tuned LLMs. Although LLMs have proven to have strong capabilities for various tasks, they are far from perfect. Trained mainly on next-token generation using a wide range of data, LLMs can suffer from incorrect generation caused at times by a lack of context in prompts (e.g., absence of CoT) or resemblance to very common sequences. In this session, we will cover:

 

  • Qualitative evaluation: Language model quality metrics and hallucinations detection
  • Use of MLflow to automate the evaluation and the monitoring of LLMs
  • Iterative quality improvement through prompt engineering strategies and dataset refinement through curation

 

These methods allowed us to quickly iterate on prompts and fine-tuned models to make them production trustworthy.

SESSION SPEAKERS

Pierre Lourdelet

/Data Scientist
Mirakl

Loic Pauletto

/Data Scientist
Mirakl