Session

Doubling Medical Safety: Fine-Tuning Open LLMs for Women's Health Without Human Labels

Overview

ExperienceIn Person
TrackArtificial Intelligence & Agents
IndustryHealthcare & Life Sciences
TechnologiesAI/BI, Databricks Apps, Agent Bricks
Skill LevelIntermediate

Enterprises building LLM features in healthcare hit the same wall: satisfying dozens of safety rules simultaneously—crisis escalation, treatment boundaries, referral language—while real user data is off-limits and expert labeling is prohibitively expensive.We'll show how Flo Health broke through using RFT-inspired synthetic fine-tuning, transforming Llama 3.3 70B into a healthcare-compliant assistant for women's health that doubled safety compliance versus our previous iteration. The key insight: instead of investing expert time in labeling, we redirected it into designing LLM judges that scale.Our system uses 60 LLM judges—52 for medical safety, 8 for usefulness—with priority-weighted reward aggregation where P1 safety rules dominate over P2 quality rules. You'll learn patterns for multi-judge evaluation systems, reward aggregation strategies for binary constraints, and why simpler approaches beat complex alternatives. For anyone building AI where "mostly safe" isn't good enough.

Session Speakers

Speaker placeholderIMAGE COMING SOON

Vladislav Nedosekin

/Director of Engineering - AI Platform
Flo Health

Speaker placeholderIMAGE COMING SOON

Michael Shtelma

/Lead Product Specialist - GenAI
Databricks

Andras Meczner

/Director of Medical Accuracy & Safety
Flo Health