ホームData + AI Summit 2022 のロゴ
Watch on demand

Cleanlab: AI to Find and Fix Errors in ML Datasets

On Demand

Type

  • Session

フォーマット

  • In-Person

Track

  • データサイエンス、機械学習、MLOps

Difficulty

  • Intermediate

Room

  • Moscone South | Level 3 | 314

Duration

  • 35 min

概要

Real-world datasets have a large fraction of errors, which negatively impacts model quality and benchmarking. This talk presents Cleanlab, an open-source tool that addresses these issues using the latest research in data-centric AI. Cleanlab has been used to improve datasets at a number of Fortune 500 companies.



Ontological issues, invalid data points, and label errors are pervasive in datasets. Even gold-standard ML datasets have on average 3.3% label errors (labelerrors.com). Data errors degrade model quality, and errors lead to incorrect conclusions about model performance and suboptimal models being deployed.



We present the cleanlab open-source package (github.com/cleanlab/cleanlab) for finding and fixing data errors. We will walk through using Cleanlab to fix errors in a real-world dataset, with an end-to-end demo of how Cleanlab improves a dataset and model performance.



Finally, we will show Cleanlab Studio, which provides a web interface for human-in-the-loop data quality control.

Session Speakers

Curtis Northcutt

CEO

Cleanlab

Data+AI サミットの様子をご覧いただけます

Watch on demand