Solution Accelerator

Automated PHI Removal

Pre-built code, sample data and step-by-step instructions ready to go in a Databricks notebook

automated-phi-removal-header-image

Image of a Databricks notebook interface for automated PHI removal using NLP models.

Detect and protect sensitive patient data with NLP

HIPAA requires organizations to limit access to Protected Health Information (PHI). Removing PHI from unstructured data such as images and PDFs can be challenging and manually intensive. Our joint Solution Accelerator with John Snow Labs automates the detection of sensitive information contained within unstructured data using NLP models for healthcare. Extracted data is stored within the Lakehouse, where teams can use the pre-trained models to easily remove, obfuscate or mask data for downstream analytics at massive scale.

Convert unstructured data like PDFs to structured text with OCR models
Easily detect PHI using pre-trained NLP models for healthcare
Automatically remove or de-identify PHI for downstream analysis

Download notebook

Resources

Blog

Healthcare PHI removal blog banner with geometric design

On-demand workshop

Image of automated PHI removal process using NLP models for healthcare data.

Blog

Applying Natural Language Processing to Healthcare Text at Scale

Deliver AI innovation faster with Solution Accelerators for popular industry use cases. See our full library of solutions

Ready to get started?

Try Databricks for free