Skip to main content

Solution Accelerator

Automated PHI Removal

Pre-built code, sample data and step-by-step instructions ready to go in a Databricks notebook


Detect and protect sensitive patient data with NLP

HIPAA requires organizations to limit access to Protected Health Information (PHI). Removing PHI from unstructured data such as images and PDFs can be challenging and manually intensive. Our joint Solution Accelerator with John Snow Labs automates the detection of sensitive information contained within unstructured data using NLP models for healthcare. Extracted data is stored within the Lakehouse, where teams can use the pre-trained models to easily remove, obfuscate or mask data for downstream analytics at massive scale.

  • Convert unstructured data like PDFs to structured text with OCR models
  • Easily detect PHI using pre-trained NLP models for healthcare
  • Automatically remove or de-identify PHI for downstream analysis
Download notebook




On-demand workshop




Deliver AI innovation faster with Solution Accelerators for popular industry use cases. See our full library of solutions

Ready to get started?