• Databricks Data Classification makes it easy to continuously discover sensitive data and eliminate compliance blind spots across your entire data estate.
• Data classification leverages agentic AI to automatically identify and tag PII at scale, keeping sensitive data visible, auditable, and governed as new tables and columns are created.
• Teams can use data classification to automate protection with ABAC, enforce consistent access policies, and confidently share data without increasing risk.
As organizations scale their data platforms, sensitive information often hides in plain sight. New tables land every day, regulatory landscapes are becoming increasingly complex, and the stakes are higher than ever. According to the GDPR Enforcement Tracker Report, GDPR fines alone exceeded €5.6 billion in 2025, a growth of €1.17 billion since 2024.
Manual discovery methods simply don’t scale. What worked for hundreds of tables fails at thousands. The result? Compliance blind spots, costly audits, and stalled democratization of data. The fundamental problem is that you simply can’t protect what you can’t find.
Today, we’re excited to announce the Public Preview of Databricks Data Classification on AWS, Azure Databricks, and GCP.
Data Classification uses an agentic AI system to automatically discover and tag sensitive data across all your catalogs. It provides continuous visibility into where personally identifiable information (PII) resides, enabling you to stay compliant, automate protection, and confidently share data across teams, even as your data grows.
Data Classification delivers comprehensive, automated PII detection across our expanding data environment, ensuring sensitive information is clearly identified and enabling consistent protection. This approach not only helps secure sensitive assets but also reduces manual workloads. As we're rolling this out more broadly, we're looking forward to freeing up our teams for higher-value initiatives. — Gregg Rinsler, Sr. Director of Data Governance, FanDuel
With automated classification in place, your teams can shift from manual classification to strategic governance:
Every data team's currency is trust, which is "consistency over time". Data Classification helps deliver that trust by scanning our data estate for PII and automating remediation workflows. The result is verified, compliant data that teams can confidently rely upon. — Sam Shah, VP of Engineering, Databricks Data Team

Data classification is designed to bring automated, agentic classification that covers all your data. Here’s how we do it:
Agentic AI for precise classification: Combines proven pattern recognition, metadata, and large language models with up to 60% higher accuracy than regex-only tools. Your data never leaves your environment following standards of Databricks AI security controls (AWS | Azure | GCP).
Efficient and intelligent scanning for enterprise scale: Scans your entire catalog once, then only rescans new or changed tables and columns. Unity Catalog lineage ensures critical datasets are incrementally scanned, ensuring PII is caught as it appears. Since our initial Beta launch, we’ve significantly improved detection speed and reduced scanning costs by up to 75%. This system is battle-tested to ensure high performance as your data platform grows.
Review and validation: Get complete visibility of the columns containing PII, and who currently has access to this data. Our focused review UI surfaces high-confidence detections with sample data, letting you easily bulk-apply tags. Full results are stored in system tables for custom reporting or tagging.
Data Classification is transforming our compliance approach by automating PII detection. We use classification results along with an authorization workflow via Databricks Apps to enable Just-In-Time access controls. This allows us to keep sensitive data accessible only when needed. We eliminated the manual efforts towards this, and instead have created automated detection and protection across our entire data residing in the Databricks Platform. — Abhijit Joshi, Staff Data Engineer, Oportun

Once you know where sensitive data lives, it’s easier to protect and access can scale safely.
Scale governance with ABAC policies: Attribute-Based Access Control (ABAC) policies automatically mask or encrypt sensitive columns. For example, set up a policy that masks all columns tagged as [class.name], [class.email_address], and [class.phone_number] for everyone except your security team. Once configured, this policy automatically applies to data tagged as sensitive, ensuring consistent data protection that scales with your business.

Use ABAC to securely open up access: Consider the customer transactions table in the example above, which might contain both sensitive columns (e.g., customer_name, email, phone) and non-sensitive columns (e.g., transaction_id or customer_id columns). ABAC policies mask only the sensitive columns while leaving non-sensitive fields open. There is no need to block entire tables or maintain complex view logic.

Here's what's on our roadmap in the coming months:
Ready to transform manual processes into automated Data Classification? Get started with our resources below:
