SESSION

Cleanlab: The Data Curation Platform for AI, Automated Annotation, and Data Improvement

OVERVIEW

EXPERIENCEIn Person
TYPELightning Talk
DURATION20 min

This talk addresses the two biggest problems facing AI practitioners today: (1) reliability and (2) time/cost spent on high quality data and annotations. Curtis covers the theory and algorithms that enabled Cleanlab, the data curation platform for AI used by 100+ of the Fortune-500 companies to automate trust and improve data in their AI stack, to find and fix millions of errors in the top 10 most benchmarked ML datasets like MNIST, ImageNet, Dolly, and Amazon Reviews. Curtis shares lessons learned from ten years working with LLMs, ML, and AI solutions at companies like Google, Amazon, Meta, Oculus, and Microsoft and shares real-world industry examples of models improved, millions saved, and number of annotations needed reduced by as much as 98%. The solutions covered will be data and model-agnostic and domain-specific to arbitrary use cases enabling them to work for both current models and future models that haven't yet been invented, like GPT-6.

SESSION SPEAKERS

Curtis Northcutt

/CEO
Cleanlab