홈페이지Data + AI Summit 2022 로고
Watch on demand

Lessons Learned from Deidentifying 700 Million Patient Notes

On Demand

Type

  • Session

Format

  • Hybrid

Track

  • 데이터 사이언스, 머신 러닝 및 MLOps

업종

  • 의료 서비스 및 생명 공학

Difficulty

  • Intermediate

Room

  • Moscone South | Upper Mezzanine | 156

Duration

  • 35 min
Download session slides

개요

Providence embarked on an ambitious journey to de-identify all our clinical electronic medical record (EMR) data to support medical research and the development of novel treatments. This talk shares how this was done for patient notes and how you can achieve the same.

First, we built a deidentification pipeline using pre-trained deep learning models, fine-tuned to our own data. We then developed an innovative methodology to evaluate reidentification risk, as American healthcare laws (HIPAA) require that de-identified data have a “very low” risk of reidentification, but do not specify a standard. Our next challenge was to annotate a dataset large enough to produce meaningful statistics and improve the fine-tuning of our model. Finally, through experimentation and iteration, we achieved a level of level of performance that would safeguard patient privacy while minimizing information loss. Our technology partner provided the computing power to efficiently process hundreds of millions of records of historical data and incremental daily loads.

Through this endeavor, we have learned many lessons that we will share:

- Evaluating risk of reidentification to meet HIPAA requirements
- Annotating samples of data to create labeled datasets
- Performing experiments and evaluating performance
- Fine-tuning pre-trained models with your own data
- Augmenting models with rules and other tricks
- Optimizing clusters to process very large volumes of text data

We will also present speed and throughput metrics from running our pipeline, which you can use to benchmark similar projects.

Session Speakers

Headshot of Nadaa Taiyab

Nadaa Taiyab

선임 데이터 사이언티스트

Tegria

Headshot of Lindsay Mico

Lindsay Mico

Director of Data Science

Providence Health

Data+AI Summit 하이라이트 보기

Watch on demand