Session

Redact at the Speed of AI: De‑identifying PII in the Lakehouse with ai_query

Overview

ExperienceIn Person
TrackArtificial Intelligence & Agents
IndustryEnterprise Technology
TechnologiesUnity Catalog, Lakebase
Skill LevelIntermediate
Teams need to unlock data responsibly. This lightning talk shows how to use AI Functions—specifically ai_query—to automatically redact PII across both structured tables and unstructured documents in Databricks.We'll walk through how it works for tabular data, then show how the same pattern extends to free-text fields like clinical notes. We'll briefly touch on batching for throughput and checking output quality. A brief live demo runs against realistic unstructured healthcare notes, de-identifying patient information so analysts and ML teams can experiment without risk. You'll leave with reusable queries, sample prompts for both structured and unstructured redaction, and a practical starting point to operationalize PII redaction in your Lakehouse.

Session Speakers

Speaker placeholderIMAGE COMING SOON

Han Tran

/Solutions Architect
Databricks