Session
Stop Guessing, Start Scoring: LLM Extraction Evaluation on Databricks
Overview
| Experience | In Person |
|---|---|
| Track | Artificial Intelligence & Agents |
| Industry | Consulting & Services |
| Technologies | Databricks Apps, Agent Bricks, Lakebase |
| Skill Level | Intermediate |
Speakers: Michelle JanneyCoyle (Databricks), Darshana Nair (CLA)Robust evaluation is key to building reliable LLM-based systems. Together with CliftonLarsonAllen (CLA), a leading accounting and professional services firm, we developed a Databricks-native evaluation solution. This system centralizes metrics, ground truth, and expert feedback to enable repeatable offline evaluation for their SOC extraction workflow.In this talk, we deconstruct our implementation, highlighting how we use managed MLflow, custom scorers, and a Databricks App with Lakebase to capture SME feedback and structured ground truth. During this talk you will learn how this framework makes quality measurable and provides clear signal for future improvements. Key TakeawaysAutomating evaluation using Databricks tools.Using Databricks Apps to bridge the gap between Data Science and domain experts.Lessons from evaluating complex semantic extraction and structured JSON outputs.
Session Speakers
Darshana Nair
/Data Scientist Director
CLA
Michelle JanneyCoyle
/AI Forward Deployed Engineer
Databricks