Skip to main content

Announcing the General Availability of Row and Column Level Security with Databricks Unity Catalog

Provide Security for Your Tables with Row Filters and Column Masks
Share this post

We are excited to announce the general availability of Row Filters and Column Masks in Unity Catalog on AWS, Azure, and GCP! Managing fine-grained access controls on rows and columns in tables is critical to ensure data security and meet compliance. With Unity Catalog, you can use standard SQL functions to define row filters and column masks, allowing fine-grained access controls on rows and columns. Row Filters let you control which subsets of your tables' rows are visible to hierarchies of groups and users within your organization. Column Masks let you redact your table values based on the same dimensions.

"Unity Catalog allowed us to create a unified view of our data estate, simplifying collaboration across teams within BlackBerry. We now have a standard approach to manage access permissions and audit files or tables in our lake, with the ability to define fine-grained access controls on rows and columns. Automated data lineage helped us see where the data is coming from to pinpoint the source of a potential threat and to understand which research projects or teams are leveraging the data for threat detection."
— Justin Lai, Distinguished Data Architect, Blackberry

This blog discusses how you can enable fine-grained access controls using Row Filters and Column Masks.

What is Coarse-Grained Object-Level Security?

Before this announcement, Unity Catalog already supported object-level security. For example, you can use GRANT and REVOKE SQL commands over securable objects such as tables and functions to adjust which users and groups are allowed to inspect, query, or modify them:

USE CATALOG main;
CREATE SCHEMA accounts;
CREATE TABLE accounts.purchase_history(
  amount_cents BIGINT,
  region STRING,
  payment_type STRING,
  purchase_date DATE DEFAULT CURRENT_DATE())
USING DELTA;

We can grant read access to the accounts_team:

GRANT SELECT ON TABLE accounts.purchase_history TO accounts_team;

Now, the accounts_team has access to query (but not modify) the purchase_history table.

Prior Approaches for Sharing Subsets of Data with Different Groups

But what if we have separate accounts teams for different regions?

One strategy uses dynamic views. You can define a view specifically intended for consumption by specific user(s) or group(s):

CREATE VIEW accounts.purchase_history_emea
AS SELECT amount_cents, payment_type, purchase_date
FROM accounts.purchase_history
WHERE IS_ACCOUNT_GROUP_MEMBER('EMEA');

GRANT SELECT ON VIEW accounts.purchase_history_emea
TO accounts_team_emea;

This involves no data copying, but users still have to remember to query the accounts.purchase_history_emea table if they are in the EMEA region or the accounts.purchase_history_apac table if they are in the APAC region, and so on.

Dynamic views from an admin perspective have a valid value proposition for certain cases. However, for this example, some constraints apply:

  • Limited to queries; cannot insert or update data within views
  • Must create and maintain numerous views for each region
  • Shared SQL logic is cumbersome to reuse across different regional teams
  • Causes clutter in the Catalog Explorer

In addition to the above, dynamic views do not provide any protection from downstream users discovering values of rows scanned from referenced tables and then filtered or aggregated out within the view. For example, users could craft WHERE clauses that throw errors in response to certain column values with the intention of the query optimizer pushing these operations down within the view evaluation itself.

As a last resort, we could instead create a daily job to copy subsets of data into different tables and set their permissions accordingly:

-- Create a table for data from the EMEA region and grant
-- read access to the corresponding accounts group.
CREATE TABLE accounts.purchase_history_emea(
  amount_cents INT,
  payment_type STRING,
  purchase_date DATE DEFAULT CURRENT_DATE())
USING DELTA;

GRANT SELECT ON TABLE accounts.purchase_history_emea TO accounts_team_emea;

-- Run this daily to update the custom table.
-- Use the previous day to make sure all the data is available before
-- copying it.
INSERT INTO accounts.purchase_history_emea
SELECT * EXCEPT (region) FROM accounts.purchase_history
WHERE region = 'EMEA' AND purchase_date = DATE_SUB(CURRENT_DATE(), 1);

While this approach effectively addresses query needs, it comes with drawbacks. By duplicating data, we increase storage and compute usage. Also, the duplicated data lags behind the original, introducing staleness. Moreover, this solution caters solely to queries due to restricted user permissions, limiting write access to the primary table.

Introducing Row Filters

With row filters, you can apply predicates to a table, ensuring that only rows meeting specific criteria are returned in subsequent queries.

Each row filter is implemented as a SQL user-defined function (UDF). To begin, write a SQL UDF with a boolean result whose parameter type(s) are the same as the column(s) of your target table that you want to control access by.

For consistency, let's continue using the region column of the previous accounts.purchase_history table for this purpose.

CREATE FUNCTION accounts.purchase_history_row_filter(region STRING)
RETURN CASE
  WHEN IS_ACCOUNT_GROUP_MEMBER('emea') THEN region = 'EMEA'
  WHEN IS_ACCOUNT_GROUP_MEMBER('admin') THEN TRUE
  ELSE FALSE
END;

We can test this logic by performing a few queries over the target table and applying the function directly. For someone in the accounts_team_emea group, such a query might look like this:

SELECT amount_cents,
  region,
  accounts.purchase_history_row_filter(region) AS filtered 
FROM accounts.purchase_history;

+--------------+--------+----------+
| amount_cents | region | filtered |
+--------------+--------+----------+
| 42           | EMEA   | TRUE     |
| 1042         | EMEA   | TRUE     |
| 2042         | APAC   | FALSE    |
+--------------+--------+----------+

Or for someone in the admin group who is setting the access control logic in the first place, we find that all rows from the table are returned:

SELECT amount_cents, region, purchase_history_row_filter(region) AS filtered 
FROM accounts.purchase_history;

+--------------+--------+----------+
| amount_cents | region | filtered |
+--------------+--------+----------+
| 42           | EMEA   | TRUE     |
| 1042         | EMEA   | TRUE     |
| 2042         | APAC   | TRUE     |
+--------------+--------+----------+

Now we're ready to apply this logic to our target table as a policy function, and grant read access to the accounts_team_emea group:

ALTER TABLE accounts.purchase_history
SET ROW FILTER accounts.purchase_history_row_filter ON (region);

GRANT SELECT ON TABLE accounts.purchase_history TO accounts_team_emea;

Or, we can assign this policy directly to the table at creation time to make sure there is no period where the table exists, but the policy does not yet apply:

CREATE TABLE accounts.purchase_history_emea(
  amount_cents INT,
  payment_type STRING,
  purchase_date DATE DEFAULT CURRENT_DATE())
USING DELTA
WITH ROW FILTER purchase_history_row_filter ON (region);

GRANT SELECT ON TABLE accounts.purchase_history TO accounts_team_emea;

After that, querying from the table should return the subsets of rows corresponding to the results of our testing above. For example, the accounts_team_emea members will receive the following result:

SELECT amount_cents, region FROM accounts.purchase_history;

+--------------+--------+
| amount_cents | region |
+--------------+--------+
| 42           | EMEA   |
| 1042         | EMEA   |
+--------------+--------+

But what if we want to write new data to the table? This would not be possible if accounts.purchase_history was a dynamic view, but since it is a table with a row filter, we can simply use SQL to insert new rows as needed:

INSERT INTO accounts.purchase_history(amount_cents, region)
VALUES (1043, 'EMEA');

SELECT amount_cents, region FROM accounts.purchase_history;

+--------------+--------+
| amount_cents | region |
+--------------+--------+
| 42           | EMEA   |
| 1042         | EMEA   |
| 1043         | EMEA   |
+--------------+--------+

Now, we can share the same accounts.purchase_history table with different groups without copying the data or adding many new names into our namespace.

You can view this information on the Catalog Explorer. Looking at the purchase_history table, we see that a row filter applies:

Catalog Explorer

Clicking on the row filter, we can see the policy function name:

Policy Function

Following the "view" button reveals the function contents:

Function Contents

Introducing Column Masks

We have demonstrated how to create and apply fine-grained access controls to tables using row filters, selectively filtering out rows that the invoking user does not have access to read at query time. But what if we want to control access to columns instead, eliding some column values and leaving others intact within each row?

Announcing column masks!

Each column mask is also implemented as a SQL user-defined function (UDF). However, unlike row filter functions returning boolean results, each column mask policy function accepts one argument and returns the same type as this input argument. For example, we can create column masks to filter out PII in email addresses with policies like this:

CREATE FUNCTION email_mask(email STRING)
RETURN CASE
  WHEN IS_ACCOUNT_GROUP_MEMBER('admin') THEN email
  ELSE SUBSTRING(
    SPLIT_PART(email, "@", 1), 1, 1) || "####" || "@" ||
    SPLIT_PART(email, "@", 2)
END;

For our running accounts.purchase_history table here, let's go ahead and mask out the purchase amount column when the value is more than one thousand:

CREATE FUNCTION accounts.purchase_history_mask(amount_cents INT)
RETURN CASE
  WHEN IS_ACCOUNT_GROUP_MEMBER('admin') THEN amount_cents
  WHEN amount_cents < 1000 THEN amount_cents
  ELSE NULL
END;

Now, only administrators have permission to look at the purchase amounts of $10 or greater.

Let's go ahead and test the policy function. Non-admin users see this:

SELECT amount_cents,
  accounts.purchase_history_mask(amount_cents) AS masked,
  region
FROM accounts.purchase_history;

+--------------+--------+----------+
| amount_cents | masked | region   |
+--------------+--------+----------+
| 42           | 42     | EMEA     |
| 1042         | NULL   | EMEA     |
| 2042         | NULL   | APAC     |
+--------------+--------+----------+

But administrators have access to view all the data:

SELECT amount_cents,
  accounts.purchase_history_mask(amount_cents) AS masked,
  region
FROM accounts.purchase_history;

+--------------+--------+----------+
| amount_cents | masked | region   |
+--------------+--------+----------+
| 42           | 42     | EMEA     |
| 1042         | 1042   | EMEA     |
| 2042         | 2042   | APAC     |
+--------------+--------+----------+

Looks great! Let's apply the mask to our table:

ALTER TABLE accounts.purchase_history
ALTER COLUMN amount_cents
SET MASK accounts.purchase_history_mask;

After that, querying from the table should redact specific column values corresponding to the results of our testing above. For example, non-administrators will receive the following result:

SELECT amount_cents, region FROM accounts.purchase_history;

+--------------+--------+
| amount_cents | region |
+--------------+--------+
| 42           | EMEA   |
| NULL         | EMEA   |
| NULL         | APAC   |
+--------------+--------+

It works correctly.

We can also inspect the values of other columns to make our masking decision. For example, we can modify the function to look at the region column instead of the purchase amount:

ALTER TABLE accounts.purchase_history ALTER COLUMN amount_cents DROP MASK;

CREATE FUNCTION accounts.purchase_history_region_mask(
  amount_cents INT,
  region STRING)
RETURN CASE
  WHEN IS_ACCOUNT_GROUP_MEMBER('admin') THEN amount_cents
  WHEN region = 'APAC' THEN amount_cents
  ELSE NULL
END;

Now we can apply the mask with the USING COLUMNS clause to specify the additional column name(s) to pass into the policy function:

ALTER TABLE accounts.purchase_history
ALTER COLUMN amount_cents
SET MASK accounts.purchase_history_mask
USING COLUMNS (region);

Thereafter, querying from the table should redact certain column values differently for non-administrators:

SELECT amount_cents, region FROM accounts.purchase_history;

+--------------+--------+
| amount_cents | region |
+--------------+--------+
| NULL         | EMEA   |
| NULL         | EMEA   |
| 2042         | APAC   |
+--------------+--------+

We can look at the mask by looking at the table column in the Catalog Explorer:

Catalog Explorer

Like before, following the "view" button reveals the function contents:

Function Contents

Storing Access Control Lists in Mapping Tables

Row filter and column mask policy functions almost always need to refer to the current user and compare it against a list of allowed users or check its group memberships against an explicit list of allowed groups. Listing these user and group allowlists in the policy functions themselves works well for lists of reasonable sizes. For larger lists or cases where we would prefer extra assurance that the identities of the users or groups themselves are hidden from view for users, we can take advantage of mapping tables instead.

These mapping tables act like personalized gatekeepers, deciding which data rows users or groups can access in your original table. The beauty of mapping tables lies in their seamless integration with fact tables, making your data security strategy more effective.

This approach is a game-changer for various custom requirements:

  • Tailored User Access: You can impose restrictions based on individual user profiles while accommodating specific rules for user groups. This ensures that each user sees only what they should.
  • Handling Complex Hierarchies: Whether it's intricate organizational structures or diverse sets of rules, mapping tables can navigate the complexities, ensuring that data access adheres to your unique hierarchy.
  • Seamless External Model Replication: Replicating complex security models from external source systems becomes a breeze. Mapping tables help you mirror these intricate setups without breaking a sweat.

For example:

CREATE TABLE accounts.purchase_history_groups
AS VALUES ('emea'), ('apac') t(group);

CREATE OR REPLACE FUNCTION accounts.purchase_history_row_filter(region STRING)
RETURN EXISTS(SELECT 1 FROM accounts.purchase_history_groups phg
WHERE IS_ACCOUNT_GROUP_MEMBER(phg.group));

Now, we can extend the accounts.purchase_history_groups table to large numbers of groups without making the policy function itself complex, and also restrict access to the rows of that table to only the administrator that created the accounts.purchase_history_row_filter SQL UDF.

Using Row and Column Level Security with Lakehouse Federation

With Lakehouse Federation, Unity Catalog solves critical data management challenges to simplify how organizations handle disparate data systems. It allows you to create a unified view of your entire data estate, structured and unstructured, enabling secure access and exploration for all users regardless of data source. It allows efficient querying and data combination through a single engine, accelerating various data analysis and AI applications without requiring data ingestion. Additionally, it provides a consistent permission model for data security, applying access rules and ensuring compliance across different platforms.

The fine-grained access controls announced here work seamlessly with Lakehouse Federation tables to support sharing access to federated tables within your organizations with custom row and column-level access policies for different groups. There is no need to copy data or create many duplicate or similar table/view names in your catalogs.

For example, you can create a federated connection to an existing MySQL database. Then, browse the Catalog Explorer to inspect the foreign catalog:

Foreign Catalog

Inside the catalog, we find a mysql_demo_nyc_pizza_rating table:

mysql demo

Catalog

Let's apply our row filter to that table:

ALTER TABLE mysql_catalog.qf_mysql_demo_database.mysql_demo_nyc_pizza_rating 
SET ROW FILTER main.accounts.purchase_history_row_filter ON (name);

Looking at the table overview afterward, it reflects the change:

mysql

Clicking on the row filter reveals the name of the function, just like before:

MySQL table

Now, queries over this federated MySQL table will return different subsets of rows depending on each invoking user's identity and group memberships. We've successfully integrated fine-grained access control with Lakehouse Federation, resulting in simplified usability and unified governance for Delta Lake and MySQL tables in the same organization.

Getting started with Row and Column Level Security

With Row Filters and Column Masks, you now gain the power to streamline your data management, eliminating excessive ETL pipelines and data copies. This is your gateway to a new world of unified data security, where you can confidently share data with multiple users and groups while maintaining control and ensuring that sensitive information remains protected.

To get started with Row Filters and Column Masks, check out our documentation on AWS, Azure, and GCP. You can run workloads that query tables with Row Filters and Column Masks on any UC Compute: Serverless, shared access mode, Single-user access mode (from DBR 15.4 onwards)

Try Databricks for free

Related posts

See all Platform Blog posts