Completing the Lakehouse Vision: Open Storage, Open Access, Unified Governance

Blog: Realizing the Lakehouse Vision: Open Storage, Open Access, Unified Governance

Published: December 2, 2025

by Matei Zaharia, Jonathan Keller and Michelle Leon

Summary

• Unity Catalog is now the only catalog to unify attribute-based access control across engines, enforcing row filters and column masks everywhere data is accessed.
• Unity Catalog applies server-side filtered scan plans so external engines automatically receive only authorized data, without custom policy logic or duplication.
• Teams can now finally adopt a single, scalable security model for Delta Lake and Iceberg, achieving consistent fine-grained governance across the open lakehouse and unlocking true interoperability.

Until now, organizations have had no way to unify attribute-based access control (ABAC) policies across the different query engines and tools they use in the lakehouse. Today, we are excited to announce the Preview of fine-grained access controls for external engines. With this launch, Unity Catalog becomes the first and only catalog to support cross-engine ABAC, allowing teams to define tag-based row filters and column masks once and have them enforced everywhere data is accessed.

Unified governance with Unity Catalog eliminates one of the biggest remaining challenges in the lakehouse: fragmented governance across different engines. Built on the Apache Iceberg REST catalog APIs, Unity Catalog ensures that data stays open to external engines while remaining fully governed with fine-grained policies.

Completing the Lakehouse vision

As organizations adopted the lakehouse architecture, data moved out of proprietary warehouse systems and into open storage and open table formats, where it could be accessed from multiple engines, such as Spark, Trino, and DuckDB, without duplication or lock-in.

But this new level of openness introduced a hard problem: governance needed to work everywhere data could be accessed. In traditional warehouses, row- and column-level controls were enforced inside a single engine. In the open lakehouse, data could now be accessed from many engines — and ensuring consistent policy enforcement across all of them became a new challenge.

Security teams were forced to choose between complicated or risky approaches, such as:

Manually duplicating access control policies across multiple systems, increasing operational burden and risking policy drift
Maintaining separate filtered views or table copies for different engines or regions, which introduces duplication and inconsistency
Or bypassing fine-grained governance entirely by granting broad table-level access to downstream tools

Customers consistently told us that while open formats delivered on the promise of flexible compute over shared data, that vision could only be fully realized if governance was unified as well.

Challenges with fine-grained governance across multiple engines

Unity Catalog already provides universal table-level access control across multiple engines through credential vending — a major step forward in unifying governance in the lakehouse. However, table-level access alone does not solve for scenarios where users should only see a subset of rows or columns. Sensitive data often needs more granular controls, such as regional restrictions or masking for PII.

Fine-grained enforcement typically happens inside the compute engine. In Databricks runtimes, we solve this using server-side filtering, where queries that require fine-grained enforcement are transparently routed through a secure filtering fleet so only permitted data is processed. But external engines such as DuckDB, Trino, or Spark running outside Databricks don’t have this logic built in, and do not enforce governance policies with consistent semantics.

This surfaced a core challenge: organizations want the flexibility to use any engine in the lakehouse, without losing fine-grained governance.

Key principles

Our customers told us that any solution needed to:

Work across any engine and tool, including those without native governance capabilities (e.g., DuckDB, Python, pandas)
Preserve the full expressiveness of Unity Catalog policies, including attribute-based access control (ABAC) and custom row and column masking logic
Operate efficiently at scale, without introducing high latency or complex operational overhead

Our approach

There are two approaches to extending fine-grained governance across engines:

Policy exchange – Vendors agree on a shared policy language and enforcement model that every engine implements.
Centralized enforcement – The governing catalog evaluates policies and enforces them before data reaches the external engine.

In the long run, we expect the ecosystem to converge on the first approach — a shared policy standard that trusted engines can enforce natively.

However, doing this correctly requires solving three foundational challenges:

Developing a common policy language and semantics across systems - Identity models, policy semantics, and execution environments vary widely across systems. Any early cross-vendor standard would risk converging on a lowest-common-denominator language that cannot support advanced policies such as attribute-based access control, complex subqueries, etc.
Secure transmission of policy context - Catalogs must be able to share policy context with external engines without leaking sensitive identity or attribute information
Establishing trust and verifiable enforcement across engines - Catalogs need confidence that engines will enforce policies securely and consistently

Solving these challenges will require broad alignment across the ecosystem. We’ve already begun early discussions in the Apache Iceberg community to explore the foundations of a shared policy model that can support the sophisticated governance requirements that enterprises rely on.

Unity Catalog’s ABAC model is built for enterprise-grade security, supporting expressive controls such as row filters with complex subqueries, conditional logic driven by governed tags and user attributes, and advanced masking functions powered by SQL, Python and Scala UDFs. We are committed to collaborating with the open-source community to establish a standard that supports these advanced security primitives and scales across the ecosystem.

In parallel, centralized enforcement is the only approach that delivers fine-grained governance on external engines today. It allows customers to retain the full expressiveness of Unity Catalog’s governance capabilities, guarantees consistent enforcement on any external engine (including those with no native policy runtime), and avoids relying on implicit trust in external systems.

Powered by open standards

To enable cross-engine enforcement, Unity Catalog builds on open standards. The Iceberg REST catalog protocol allows engines to request that the server determine which data should be read and how it should be accessed. This is called “scan planning”, and it enables the catalog to optimize data access patterns and improve query performance.

Unity Catalog extends this model by generating filtered scan plans that apply fine-grained policies—such as row filters, column masks, and attribute-based rules—based on the user’s entitlements.

Here’s how it works. When an external engine requests access to a governed table:

The engine issues a scan request using the Iceberg REST catalog API
Unity Catalog evaluates applicable fine-grained policies
UC applies the requisite row filters and column masks and returns a filtered scan plan

Only the authorized subset of data is exposed to the engine for processing. Because enforcement occurs as part of server-side planning, no custom policy logic is required in the engine.

Databricks uses highly optimized serverless compute to perform this filtering, leveraging the same infrastructure used to ensure security within Databricks runtimes. Within this system, filtered data is incrementally and efficiently processed using the technology that also underpins Databricks’ declarative pipelines. This ensures that results are returned at a low cost and with minimal latency.

By building on the Iceberg REST catalog APIs, Unity Catalog sets the foundation for open, cross-engine governance, where policies are enforced efficiently and consistently across engines on a single copy of data.

Define policies once, enforce everywhere

Fine-grained enforcement on external engines allows customers to adopt a single, scalable security model for all Unity Catalog managed tables - Delta Lake and Iceberg.

With ABAC, administrators define row filters and column masking logic once, and policies are dynamically applied based on governed tags and user attributes. Now, that same unified policy layer extends beyond Databricks, starting with Apache Spark and expanding further as the open ecosystem adopts the Iceberg REST catalog Scan APIs.

What this means for the open lakehouse ecosystem

Fine-grained governance on external engines marks an important milestone for the open lakehouse. Open table formats gave customers the flexibility to run multiple engines on shared data. Unity Catalog now provides the corresponding governance layer, allowing ABAC policies to be enforced regardless of where data is accessed.

This establishes a new foundation for interoperability in the open lakehouse: open formats for data, open APIs for access, and a single governance system applied consistently across the ecosystem.

What’s next

We are onboarding select customers to try out fine-grained access controls for external engines, with broader support to follow as additional engines adopt the Iceberg REST catalog scan APIs. Please fill out this form or contact your Databricks account team to be included in the preview.

What's next?

November 20, 2024/4 min read

Introducing Predictive Optimization for Statistics

November 21, 2024/3 min read