May 15, 2026

Backstage with Lakebase, part 2

Bringing the operational database into Unity Catalog

by Cameron Casher, Kevin Hartman and Surya Sai Turaga

Lakebase collapses the OLTP/OLAP divide, letting teams run a production app like Backstage on a serverless Postgres surface inside Databricks, with 1-second database branching and sub-4-second point-in-time recovery that turn risky schema migrations into routine, testable operations.
Unity Catalog absorbs the operational database into a single governance plane, replacing fragmented CloudTrail/pgaudit/CloudWatch audit workflows with one SQL query while automatically propagating row-level security and data masking to every ephemeral branch.
A single SQL join unifies infrastructure ownership and cloud billing data with zero ETL pipelines, giving FinOps and engineering leaders per-branch cost attribution and shifting the DBA's role from ticket queue to platform architect.

In part 1 of this series, we explored how moving Backstage's underlying database to Databricks Lakebase turned risky schema migrations into 1-second branch-and-test operations. But a faster developer cycle only gets you so far if Security and Governance teams are still treating your operational database like a black box.

In a traditional stack, your application database and your data lake live in two entirely different security paradigms. The ownership graph for your infrastructure lives in Backstage, backed by an isolated RDS instance and governed by complex IAM roles and Postgres native grants. Meanwhile, your warehouse data is governed by the data team using Unity Catalog. Unity Catalog is an Open Source framework created by Databricks that provides a unified governance layer for data, AI, and now operational databases – a single place to manage access controls, audit trails, lineage, and compliance across everything on the platform.

To audit a single table drop on RDS, you'd need to cross-reference CloudTrail for the IAM principal, pg_stat_activity or pgaudit logs for the SQL statement, and CloudWatch for the timestamp, three services, three query languages, three access policies. The operational database becomes a compliance side-channel.

Unity Catalog Absorbs the Operational DB

When we pointed Backstage at Lakebase, we didn't just change where the data lived; we changed where the access policy lived.

Because Lakebase is natively embedded inside Databricks, Unity Catalog extends directly over the operational Postgres database. In this POC, we used Lakehouse Federation to expose the Backstage catalog as a foreign catalog (lakebase_bs) in Unity Catalog. Once it's there, standard UC grants control who can see what, no Postgres-level role management required:

While we didn't build end-to-end Row-Level Security policies for Backstage in this POC, architecturally, the exact same RLS rules that protect sensitive billing tables can be applied directly to these operational tables. The wall between "operational" and "analytical" stops being a physical boundary, and simply becomes an access pattern.

A Unified Audit Trail Out of the Box

Remember the 1-second copy-on-write branching we executed in Part 1? In a traditional setup, proving to a security engineer that a developer only branched the database for an hour and then destroyed it is a manual exercise.

With Lakebase, every control-plane action against the operational database is automatically recorded in system.access.audit. To prove this, we queried the audit log for the exact branch operations from our Part 1 disaster-recovery experiment:

Result:

Every branch creation and deletion from our Part 1 experiments is logged. Each event is tied to a specific OAuth user identity and source IP, captured automatically, and governed by the exact same Row-Level Security controls as every other audit table in Unity Catalog. No CloudTrail cross-referencing. No RDS log parsing. One SQL query.

Automated Cost Attribution by Branch

A governance team doesn't just want to know who created a branch, they want to know what it cost.

In a traditional AWS environment, tracking the cost of an ephemeral RDS instance requires custom CloudWatch tagging strategies that often miss short-lived workloads. Because Lakebase integrates natively with Unity Catalog's system billing tables, compute costs break down automatically by project_id, branch_id, and endpoint_id.

In this POC, the production branch was billed at 31.6130 DBU, while the dropped test branch was independently attributed 0.0107 DBU. The audit trail and the cost trail are governed in the exact same place.

What This Means for Teams That Branch Every Day

Our governance story answers the compliance question: can we prove who did what, when, and what it cost? The answer is yes – one SQL query instead of three services. But there's a second governance question that matters just as much for development teams adopting the branching workflow from Part 1: what happens to governance when your team creates dozens of branches per sprint?

In Part 1, we described a workflow where every feature branch and every pull request gets its own isolated database copy. A team of six developers running two-week sprints might create and destroy 30-40 branches in a single sprint. That's 30-40 copies of production data, each one potentially containing sensitive fields – customer PII, financial records, health data.

This is where Unity Catalog's branch-level governance becomes load-bearing, not just convenient. When a Lakebase branch is created, Unity Catalog's attribute-level masking policies propagate automatically to the new branch. A developer working on their feature branch never sees unmasked production data – not because someone remembered to configure it, but because the governance layer enforces it at creation time. The CI branch that runs your PR tests is governed identically to production. The QA branch where a tester runs destructive scenarios is governed identically to production. There is no "non-production exception" where sensitive data leaks because someone forgot to apply the policy.

This matters more than it might seem. According to Perforce’s 2025 State of Data Compliance report, 60% of organizations have experienced breaches or theft in non-production environments where sensitive data was inadequately anonymized. The traditional approach – manually masking data when provisioning dev/test environments – doesn't scale when environments are created and destroyed in seconds. Governance has to be automatic, or it doesn't happen.

The DBA's New Opportunity

The audit trail and cost attribution data also signal a quieter shift: the DBA's role is evolving from reactive ticket work to strategic platform architecture.

Today, much of a DBA's time goes to operational requests – environment provisioning, schema reviews, data refreshes, access grants. A six-developer team can generate 30+ tickets per sprint, and the DBA's calendar becomes a queue. The expertise that makes DBAs valuable – understanding data integrity, performance, and governance at a deep level – gets buried under repetitive provisioning work.

When branching is self-service and governance is automatic, that repetitive work falls away. Developers provision their own environments in one second. Schema changes are reviewed asynchronously in pull requests – the DBA sees a formatted schema diff posted by CI, reviews it on their own schedule, and approves or requests changes through the normal PR workflow. With the time now available, those reviews go deeper: the DBA helps team members understand the existing data and structures in production, works with them to arrive at better solutions, and conducts thorough reviews that uphold data integrity and governance standards. Data masking is enforced by policy, not by manual intervention. Cost attribution is automatic, not a monthly reconciliation exercise.

What opens up is the work that actually leverages the DBA's expertise: defining branching policies, designing governance rules, architecting promotion workflows, tuning performance, and establishing the guardrails that make self-service safe. The DBA shifts from doing the work to designing how the work gets done – from 30+ operational tickets per sprint to fewer than 5 high-value policy reviews. The audit trail demonstrated above isn't just a compliance artifact – it's the DBA's new strategic dashboard, a real-time view of how the platform is being used and where to invest next.

From Role Shift to Tooling

The DBA's pivot from operational tickets to platform design only works if the tooling shifts with the role. The platform has to do the routine work on its own, and the DBA needs a place to design how that work gets done.

Two open-source tools, both deployed as Databricks Apps and both governed by the same Unity Catalog grants and audit trail described above, close that loop.

LakebaseOps is what the platform does on its own. Three agents – Provisioning, Performance, and Health – replace 51 of the tasks a DBA used to file tickets for. Seven of them run as scheduled Databricks Jobs and replace the pg_cron crontab a DBA would otherwise hand-maintain. A monitoring UI surfaces live pg_stat metrics, slow-query regressions, branch TTL enforcement, and a 9-KPI adoption dashboard. A migration wizard scores ten source engines (Aurora, RDS, Cloud SQL, AlloyDB, Cosmos DB, and more) against Lakebase, with live pricing from the AWS and Azure APIs.

Lakebase MCP is what the DBA does on top of the platform. A Model Context Protocol server exposing 46 tools to any MCP-capable AI agent (Claude, Copilot, GPT). The DBA stops opening pgAdmin and starts describing intent:

Two design choices keep this safe. First, dual-layer governance: a SQL-statement guard and a per-tool access guard, with four pre-built profiles (read_only, analyst, developer, admin) that map onto the same UC access patterns shown above. A coding assistant runs as read_only and physically cannot drop a table.

Second, every query is attributable – the server tags every statement with the originating tool:

Combined with the branch-level cost attribution shown earlier, you can answer "which agent on which branch generated the 4 AM CPU spike?" in one SQL query.

LakebaseOps runs for the team. Lakebase MCP runs with the team. Both inherit the governance posture you just saw.

In Part 3 of this series, we will look at the ultimate payoff: taking the infrastructure ownership data inside Backstage and joining it directly to cloud billing data in a single SQL query.

Get the latest posts in your inbox

Subscribe to our blog and get the latest posts delivered to your inbox.

View all blogs