Unity Catalog is a unified governance solution for all data and AI assets including files, tables, machine learning models and dashboards in your lakehouse on any cloud.
Centralized governance for data and AI
Built-in data search and discovery
Performance and scale
Automated lineage for all workloads
Integrated with your existing tools
How it works
Centrally manage and govern all data assets
With a common governance model based on open standard ANSI SQL, simplify governance for files, tables, dashboards and ML models on any cloud. Define access policies once at the account level and enforce across all workloads and workspaces. Unity Catalog also provides centralized fine-grained auditing by capturing an audit log of actions performed against the data and helps you meet your compliance and audit requirements.
Manage fine-grained access controls
Use standard SQL functions to define row filters and column masks, allowing fine-grained access controls on rows and columns. As Databricks continues to build out capabilities to improve scalability and integration, upcoming attribute-based access controls will enable you to define access policies based on custom tags (attributes).
Unified and secure data search experience
Quickly find, understand and reference relevant data from across your data estate with a unified data search experience for data analysts, data engineers and data scientists. Data search in Unity Catalog is secure by default, limiting search results based on the access privileges of users, and adding an additional layer of security for privacy considerations.
Enhanced query performance at any scale
Unity Catalog provides enhanced query performance with low-latency metadata serving and table auto-tuning, resulting in faster executions of queries at any scale. Asynchronous automatic data compaction optimizes file sizes and reduces input/output (I/O) latency automatically in the background.
Automated and real-time data lineage
Gain end-to-end visibility into how data flows in your lakehouse with automated and real-time data lineage across all workloads in SQL, Python, Scala and R. Quickly perform data quality checks, complete impact analysis of data changes, and debug any errors in your data pipelines. Benefit from lineage across tables, columns, notebooks, workflows and dashboards. Lineage graphs in Unity Catalog are privilege-aware, restricting access to lineage graphs based on the access privileges of users. Lineage can also be retrieved via REST API to support integrations with other catalogs.
Secure data sharing across organizations
Unity Catalog natively supports Delta Sharing, the world’s first open protocol for secure data sharing, enabling you to easily share existing data in Delta Lake and Apache Parquet formats to any computing platform. Consumers don’t have to be on the Databricks platform, same cloud or any cloud at all. You can share live data, without replicating or copying it to another system. Native integrations with Power BI, Tableau, Spark, pandas and Java allow recipients to consume shared data directly from the tools of their choice. You can centrally manage, govern, audit and track usage of the shared data on one platform.