Platform blog

What's new with Databricks Unity Catalog at the Data+AI Summit 2022

Unified Governance for all data and AI assets
Share this post

Update: Unity Catalog is now generally available on AWS and Azure.

Today we are excited to announce that Unity Catalog, a unified governance solution for all data assets on the Lakehouse, will be generally available on AWS and Azure in the upcoming weeks. Currently, you can apply for a public preview or reach out to a member of your Databricks account team.

In a previous blog, we set out our vision for a governed lakehouse and how Unity Catalog can help customers simplify governance at scale. This blog will explore the most recent updates to Unity Catalog and our growing partner ecosystem.

What's new with Unity Catalog for the Data+AI Summit 2022?

Automated Data Lineage for all workloads

Unity Catalog now automatically tracks data lineage across queries executed in any language. Data lineage is captured down to the table and column level, while key assets such as notebooks, dashboards and jobs are tracked. Lineage opens up several use cases - including assessing the impact changes to tables will have on your data consumers, and auto-generating documentation that consumers can use to understand data in the lakehouse. For more information, see our recent blog post.

Built-in Data Search and Discovery

Unity Catalog now includes a built-in search capability. Once data is registered in Unity Catalog, end users can easily search across metadata fields including table names, column names, and comments to find the data they need for their analysis. This search capability automatically leverages the governance model put in place by Unity Catalog. Users will only see search results for data they have access to, which serves as a productivity boost for the user, and a critical control for data administrators who want to ensure that sensitive data is protected.

Search and Discovery in Unity Catalog

Simplified access controls with privilege inheritance

Unity Catalog offers a simple model to control access to data via a UI or SQL. We have now extended this model to allow data admins to set up access to 1000s of tables via a single click or SQL statement. This is achieved through a privilege inheritance model which allows admins to set access policies on whole catalogs or schemas of objects. For example, executing the following SQL statement will give the ml_team read access to all current tables and views in the main catalog, and any that are created in the future.


GRANT SELECT ON CATALOG main TO ml_team

This also serves as a way to set safe access defaults on catalogs and schemas. A common pattern may be to give a team a schema to store their data. Now an admin can set a policy on that schema so that by default all team members can read objects created by others.

Information Schema

Information Schemas have been a fundamental asset within database systems for decades. They offer a pre-defined set of views that describe the objects within the database - for example what tables have been created, when, by who, and what access levels have been granted on each, amongst other things. This metadata is often leveraged by users to understand what data is available in the system, but also to automate report generation on topics such as access levels per table. Unity Catalog brings the concept of the Information Schema to the lakehouse. Each catalog you create in Unity arrives with a pre-defined schema called information_schema which defines a set of views which describes the catalog. This can be queried from DBSQL or the notebook environment.
Information Schema in Unity Catalog
Information Schema in Unity Catalog

Azure Managed Identities in Unity Catalog

We are excited that Unity Catalog now supports using a Azure Managed Identity for accessing both managed storage and external storage in a Unity Catalog metastore. Managed Identities are a Microsoft Azure construct that provide an identity for applications to use when connecting to resources that support Azure Active Directory (AAD). Up to this point, Unity Catalog relied on Service Principals as an identity to gain access to data in Azure Data Lake Storage (ADLS). Managed Identities have two major benefits over Service Principals for this use case. Firstly, Managed Identities do not require maintaining credentials or rotating secrets. Secondly they offer a way to connect to ADLS that is protected via a storage firewall.

Upgrade your Hive Metastore to Unity Catalog

Unity Catalog now offers a seamless upgrade experience from your existing Hive Metastore to take advantage of all the new features described above! Users can select 1000s of tables to upgrade at once within our purpose built user interface. The upgrade tool works by copying metadata for tables from existing Hive Metastores to a Unity Catalog metastore. This will also automatically resolve DBFS mount points that have been used in the definition of the tables so that data can be securely accessed across your entire Databricks account. For those who prefer code over UIs, we also make the SQL syntax ('CREATE TABLE LIKE…') available for running against a Databricks cluster or SQL Warehouse.

Upgrade Hive Metastore

Better together with our governance and catalog partners

In addition to all the features and capabilities you've read about, we also have a healthy and vibrant ecosystem of partners who are joining us in supporting Unity Catalog with their products. The ecosystem is growing every day.

Privacera
"Privacera integrates with Unity Catalog by leveraging the new APIs built by the Databricks team and through a policy translation layer built by Privacera. The integration is transparent to data consumers and IT administrators and supports the same fine-grained access control functionality that is supported in Privacera integration with legacy Databricks High Concurrency clusters." –Don Bosco Durai

Don Bosco Durai
Don Bosco Durai is the co-founder and CTO of Privacera. Bosco is also the creator of the ASF Open Source project Apache Ranger and a thought leader in the security and governance space.

Read more about Privacera andUnity Catalog.

Immuta
With Unity Catalog, physical data policy enforcement is native to Databricks, less invasive to data consumers, and no longer tied to plugins specifically built for different Spark runtimes - enforcement done correctly. Meanwhile, Immuta continues to solve management challenges by providing active data monitoring, metadata discovery/centralization, scalable policy orchestration (table-, row-, column-, and cell-level controls) to include leveraging Unity Catalog's lineage features to simplify policy enforcement, and compliance reporting/alerting. – Steve Touw

Steve Touw
Steve Touw is the co-founder and CTO of Immuta. He has a long history of designing large-scale geo-temporal analytics across the U.S. intelligence community – including some of the very first Hadoop analytics and frameworks to manage complex multi-tenant data policy controls.

Read more about Immuta and Unity Catalog.

Alation
Alation and Databricks help organizations to gain data intelligence, eliminate silos, and promote governance capabilities to drive digital transformation projects. Alation enables organizations to nurture data as an asset – helping to enhance data discovery, aid understanding, promote trust and ensure compliance with relevant policies. Leveraging the data captured by the Unity metastore, Alation will enhance our existing integration with Databricks by easily including metadata from multiple workspaces. Together Databricks and Alation will ultimately provide catalog, lineage and policy management and enforcement for the Lakehouse. Alation is thrilled to partner with Databricks and looking forward to working jointly to enable data scientists, engineers, and analysts to quickly turn data into business insights. - Ibrahim "Ibby" Rahmani

Ibrahim Rahmani
Ibrahim "Ibby" Rahmani is Director of Product Marketing at Alation

Collibra
Many of Collibra's most strategic customers have found great value from the power of Databricks. This has been the focus of our technical integration with Unity Catalog. Collibra's enterprise catalog brings value to business and governance personas and, thus, we think that Unity Catalog's tactical platform focus is a perfect pairing. There are also benefits at the metadata ingestion level because there is no longer a need to have a Databricks cluster running to pull metadata. We feel that lineage, direct from a platform API like Unity Catalog, is better quality and easier to update over time as processing changes. –Vaughn Micciche

Vaughn Micciche
Vaughn Micciche is the Technical Partnership Director at Collibra

Atlan
Atlan connects to Databricks Unity Catalog's API to extract all relevant metadata powering discovery, governance, and insights inside Atlan. This integration allows Atlan to generate lineage for tables, views, and columns for all the jobs and languages that run on Databricks. By pairing this with metadata extracted from other tools in the data stack (e.g. BI, transformation, ELT), Atlan can create true end-to-end lineage. Thanks to Unity Catalog's simplified delivery system, which sends complete lineage through its API, this entire experience is near instantaneous with drastically reduced compute and cost for customers. This allows Databricks customers to holistically understand the flow of their data, gain deeper insight into the data populating their models, run RCA exercises, and even power programmatic governance at scale with Atlan's metadata activation engine. –Amit Prabhu

Amit Prabhu
Amit Prabhu is a Software Architect at Atlan leading the Orchestration team

Getting Started with Unity Catalog on AWS and Azure

Visit the Unity Catalog documentation [AWS, Azure] to learn more.

Try Databricks for free

Related posts

See all Platform Blog posts