HomepageData + AI Summit 2022 Logo
Watch on demand

Cleanlab: AI to Find and Fix Errors in ML Datasets

On Demand


  • Session


  • In-Person


  • Data Science, Machine Learning and MLOps


  • Intermediate


  • Moscone South | Level 3 | 314


  • 35 min


Real-world datasets have a large fraction of errors, which negatively impacts model quality and benchmarking. This talk presents Cleanlab, an open-source tool that addresses these issues using the latest research in data-centric AI. Cleanlab has been used to improve datasets at a number of Fortune 500 companies.

Ontological issues, invalid data points, and label errors are pervasive in datasets. Even gold-standard ML datasets have on average 3.3% label errors (labelerrors.com). Data errors degrade model quality, and errors lead to incorrect conclusions about model performance and suboptimal models being deployed.

We present the cleanlab open-source package (github.com/cleanlab/cleanlab) for finding and fixing data errors. We will walk through using Cleanlab to fix errors in a real-world dataset, with an end-to-end demo of how Cleanlab improves a dataset and model performance.

Finally, we will show Cleanlab Studio, which provides a web interface for human-in-the-loop data quality control.

Session Speakers

Curtis Northcutt



Das Beste des Data+AI Summits anzeigen

Watch on demand