Skip to main content

Database Branching in Postgres: Git-Style Workflows with Databricks Lakebase

Databricks Lakebase brings copy-on-write database branching to Postgres, so your database finally works like the rest of your development stack.

Lakebase branches replace shared staging databases and pg_dump workflows by giving each developer, pull request, or CI test run its own isolated environment.

Published: April 10, 2026

Product7 min read

Summary

  • Database branching in Databricks Lakebase Postgres uses copy-on-write storage to create isolated environments in seconds, without duplicating data.
  • Lakebase branches replace shared staging databases and pg_dump workflows by giving each developer, pull request, or CI test run its own isolated environment.
  • Branches also power instant point-in-time recovery and programmable ephemeral databases for AI agents, all through the same API.

The database is the last bottleneck in your dev workflow

Database branching is the missing primitive in modern development workflows. Every other part of the stack has evolved to support fast iteration. Code has Git. Infrastructure has Terraform. Deploys have CI/CD pipelines that run in minutes. But relational databases still work the way they did ten years ago.

Most teams share a single staging database. Within days of being set up, that database drifts out of sync with production. Schemas diverge as developers apply migrations in different orders. Sequence values no longer match. Test data accumulates and pollutes results. Someone eventually reseeds the whole thing, and the cycle starts over.

Setting up a new environment is worse. The standard approach is to run pg_dump against production, wait for it to finish (minutes to hours depending on database size), load it into a new instance, configure access, and hope the result actually reflects what is running in production. For a 500GB database, this means a 500GB copy operation, plus the compute and storage to keep it running.

The result is predictable. Teams avoid creating new environments because they are too expensive and too slow. Developers share a single mutable staging database. Migrations get tested against stale data, or not tested at all. Preview deployments run against empty fixtures instead of realistic schemas. CI tests share state and produce flaky results.

The database becomes the part of the stack that developers are afraid to touch.

Databricks Lakebase Postgres changes this with database branching.

What database branching actually is

A database branch is not a database copy. This distinction matters because it changes the economics of isolated environments entirely.

When you copy a database, you duplicate all of its data and schema into a new, independent instance. The time and cost scale linearly with the size of the database. Every copy is a full clone, and every clone starts going stale the moment it is created.

A branch works differently. When you create a branch in Lakebase, you get a new, fully isolated Postgres environment that:

  • Starts from the exact schema and data of its parent at a specific point in time
  • Shares the same underlying storage instead of duplicating it
  • Only writes new data when you actually make changes

This is called copy-on-write. As long as two branches have not diverged, they reference the same stored data. When you run a migration, insert rows, or modify tables on a branch, only those changes are written separately. Everything else is shared with the parent.

Database copy vs. database branch

 

Database copy (pg_dump, RDS snapshot)

Database branch (Lakebase)

Time to create

Minutes to hours, scales with database size

Seconds, constant regardless of database size

Storage cost

Full duplicate of all data

Proportional to changes only (copy-on-write)

Isolation

Full, but expensive to maintain

Full, with independent compute and connection strings

Freshness

Stale from the moment it is created

Starts from the exact state of the parent at branch time

Cleanup

Manual teardown of instances and storage

Delete the branch; compute and storage are reclaimed automatically

In practical terms, this means:

  • Branch creation takes seconds, regardless of database size. A 10GB database and a 2TB database branch in the same amount of time.
  • Storage cost is proportional to changes, not total data size. A branch that modifies 50MB of data in a 500GB database uses roughly 50MB of additional storage.
  • Each branch gets its own Postgres connection string and compute endpoint. Branches are fully isolated from each other and from their parent.
  • Idle branches automatically scale compute to zero. You only pay for active compute when a branch is actually being used.

Branches are designed to be created, used, and discarded freely. By developers, by CI pipelines, by AI agents, by automation. They are not precious environments that need to be maintained. They are disposable, like Git branches.

GUIDE

Your compact guide to modern analytics

The architecture that makes database branching possible

Traditional managed Postgres (RDS, AlloyDB, Azure Database for PostgreSQL) ties compute and storage together. The database process and its data live on the same instance, and the data is stored as a single mutable filesystem. That is why copying is the only option for creating a second environment: you have to duplicate the filesystem.

But a lakebase is built different. It separates compute from storage completely. All data is written to a distributed, versioned storage engine that records every change as a new version rather than overwriting existing data. This log-structured architecture is what makes database branching possible as a primitive rather than as a feature layered on top.

Because storage is versioned, multiple branches can reference the same underlying data without risk of conflict. Because compute is independent, each branch runs its own Postgres process and scales on its own. Non-production branches that sit idle scale down to zero automatically and restart in milliseconds when a connection comes in.

Not all database branching implementations are equal. Some platforms create full instance copies and call them branches. Others branch only the schema, without data. Lakebase branches include both schema and data, use copy-on-write at the storage layer to avoid duplication, and provide independent, autoscaling compute per branch. This is what makes it practical to create branches freely and at scale, without provisioning additional infrastructure.

This architecture also enables time travel. Because every version of the data is retained within a configurable restore window, you can create a branch from any point in the past, not just from the current state. This is what powers instant point-in-time recovery: instead of replaying WAL logs or restoring a backup, you create a branch at the timestamp you need and read directly from it.

What database branching unlocks for your team

Once database branching is a fast, cheap primitive instead of an expensive copy operation, new workflows become practical. Here is a summary of the most common patterns. (We cover each of these in detail in the next post in this series.)

One branch per developer. Every engineer gets their own isolated environment with production-like data. No more stepping on each other's changes in a shared dev database. When a branch drifts too far from production, reset it in one command to pull in the latest schema and data. Because branches scale to zero when idle, this pattern stays affordable even on large teams.

One branch per pull request. Automate branch creation when a PR opens and deletion when it merges or closes. Preview deployments on Vercel or Netlify each get their own database branch, so your frontend preview is backed by realistic, isolated data. Migrations run against real data shapes and constraints, not empty test fixtures. This is the workflow that teams adopt first, and it tends to be the one that convinces them to adopt database branching across the board.

One branch per test run. CI pipelines get a fresh, isolated database for every run. No leftover state from previous tests. No waiting for an empty container image to spin up and then be seeded with fake data. No flaky results caused by shared data or test ordering dependencies. Every run starts from the same baseline. For tests that require deterministic data, you can create branches from a fixed point in time or a specific Log Sequence Number (LSN).

Instant recovery. Create a branch from any point in time within your restore window. Inspect dropped tables, debug failed migrations, or audit historical data, all without touching production. Use schema diff to compare the state before and after a change. Export what you need from the recovery branch and then delete it. The whole process takes seconds, not the hours or days that traditional PITR requires.

Ephemeral environments for AI agents. AI agents can provision databases programmatically via the Lakebase API, use them for the duration of a task, and shut them down when done. Platforms can build versioning on top of snapshots: every agent action creates a checkpoint, and users can jump between versions instantly. If an agent runs a bad migration or corrupts data, rolling back is a single API call.

Getting started

Database branching in Databricks Lakebase turns your Postgres database from the slowest part of your development workflow into the fastest.

You can create your first branch in under a minute using the console, CLI, or API. Here is what it looks like from the CLI:

That is it. You now have an isolated Postgres environment with the full schema and data from production, ready to use.

If you are building on Postgres and tired of the overhead that comes with managing database environments, start with a single dev branch. Then try one per PR. Most teams that start with one database branching workflow quickly adopt the rest.

Databricks Lakebase is serverless Postgres built for agents and apps. Learn more at databricks.com/product/lakebase.

Never miss a Databricks post

Subscribe to our blog and get the latest posts delivered to your inbox