What Is a Lakebase?

Published: June 11, 2025

by Ali Ghodsi, Stas Kelvich, Heikki Linnakangas, Nikita Shamgunov, Arsalan Tavakoli-Shiraji, Patrick Wendell, Reynold Xin and Matei Zaharia

Summary

Operational databases were not designed for today’s AI-driven applications. They sit outside the analytics and AI stack, require manual integration, and lack the flexibility needed for modern development workflows.
Lakebase introduces a new architecture for OLTP databases, that includes separation of compute and storage for independent scaling and branching.
Deeply integrated with the lakehouse, Lakebase simplifies operational data workflows. It eliminates fragile ETL pipelines and complex infrastructure, enabling teams to move faster and deliver intelligent applications on a unified data platform

In this blog, we propose a new architecture for OLTP databases called a lakebase. A lakebase is defined by:

Openness: Lakebases are built on open source standards, e.g. Postgres.
Separation of storage and compute: Lakebases store their data in modern data lakes (object stores) in open formats, which enables scaling compute and storage separately, leading to lower TCO and eliminating lock-in.
Serverless: Lakebases are lightweight, and can scale elastically instantly, up and down, all the way to zero. At zero, the cost of the lakebase is just the cost of storing the data on cheap data lakes.
Modern development workflow: Branching a database should be as easy as branching a code repository, and it should be near instantaneous.
Built for AI agents: Lakebases are designed to support a large number of AI agents operating at machine speed, and their branching and checkpointing capabilities allow AI agents to experiment and rewind.
Lakehouse integration: Lakebases should make it easy to combine operational, analytical, and AI systems without complex ETL pipelines.

Openness

Most technologies have some degree of lock-in, but nothing has more lock-in than traditional OLTP databases. As a result, there has been very little innovation in this space for decades. OLTP databases are monolithic and expensive, with significant vendor lock-in.

At its core, a lakebase is grounded in battle-tested, open source technologies. This ensures compatibility with a broad ecosystem of tools and developer workflows. Unlike proprietary systems, lakebases promote transparency, portability, and community-driven innovation. They give organizations the confidence that their data architecture won’t be locked into a single vendor or platform.

Postgres is the leading open source standard for databases. It is the fastest growing OLTP database on DB-Engines and leads the StackOverflow developer survey as the most popular database by a wide margin. It has a mature engine with a rich ecosystem of extensions.

Separation of Storage and Compute

One of the most fundamental technical pillars of lakehouses is the separation of storage and compute. It enables independent scaling of compute resources and storage resources. Lakebases share the same architecture. This is more challenging to build because low cost data lakes were not initially designed for the stringent workloads OLTP databases run, e.g. single digit millisecond latency and millions of transactions per second throughput.

Note that some earlier attempts at separation of storage and compute have been made by various proprietary databases, such as several hyperscaler Postgres offerings. These are built on proprietary, closed storage systems that are inherently more expensive and do not expose open storage.

Lakebases evolved based on the earlier attempts to leverage low cost data lakes and truly open formats. Data is persisted in object stores in open formats (e.g. Postgres pages), and compute instances read directly from data lakes but leverage intermediate layers with soft state to improve performance.

Serverless Experience

Traditional databases are heavyweight infrastructure that require a lot of management. Once provisioned, they typically run for years. If overprovisioned, one spends more than they need to. If underprovisioned, the databases won’t have the capacity to scale to the needs of the application and can incur downtime to scale up.

A lakebase is lightweight and serverless. It spins up instantly when needed, and scales down to zero when no longer necessary. It scales itself automatically, as loads change. All of these capabilities are made possible by the separation of storage and compute architecture.

Lakehouse integration

In traditional architectures, operational databases and analytical systems are completely siloed. Moving data between them requires custom ETL pipelines, manual schema management, and separate sets of access controls. This fragmentation slows development, introduces latency, and creates operational overhead for both data and platform teams.

A lakebase solves this with deep integration into the lakehouse, enabling near real-time synchronization between operational and analytical layers. As a result, data becomes available quickly for serving in applications, and operational changes can flow back into the lakehouse without complex workflows, duplicated infrastructure, or egress costs incurred from moving data. Integration with the lakehouse also simplifies governance, with consistent data permissions and security.

Modern Development Workflow

Today, virtually every engineer’s first step in modifying a codebase is to create a new git branch of the repository. The engineer can make changes to the branch and test against it, which is fully isolated from the production branch. This workflow breaks down with databases. There is no “git checkout -b” equivalent to traditional databases, and as a result, database changes tend to be one of the most error-prone parts of the software development lifecycle.

Enabled by a copy-on-write technique from the separation of storage and compute architecture, lakebases enable branching of the full database, including both schema and data, for high fidelity development and testing. This new branch is created instantly, and at extremely low cost, so it can be used whenever “git checkout -b” is needed.

Built for AI Agents

Neon’s data show that over the course of the last year, databases created by AI agents increased from 30% to over 80%. This means that AI agents today outcreate human databases by a factor of 4. As the trend continues, in the near future, 99% of databases will be created and operated by AI agents, often with humans in the loop. This will have profound implications on the requirements of database design, and we think lakebases will be best positioned to serve these AI agents.

In less than a year, the percentage of Neon databases generated by agents grew from 30% to 80% and now out-create humans 4 to 1.

If you think of AI agents as your own massive team of high-speed junior developers (potentially “mentored” by senior developers), the aforementioned capabilities of lakebases will be tremendously helpful to AI agents:

Open source ecosystem: All frontier LLMs have been trained on the vast amount of public information available about popular open source ecosystems such as Postgres, so all AI agents are already experts in these systems.
Speed: Traditional databases were designed for humans to provision and operate. It was OK to take minutes to spin up a database. Given AI agents operate at machine speed, ultra rapid provisioning time becomes critical.
Elastic scaling and pricing: The separation of storage and compute serverless architecture enables extremely low-cost Postgres instances. It’s now possible to launch thousands or even millions of agents with their own databases cost-effectively, without requiring specialized engineers (e.g. DBAs) to maintain/support staging environments; this reduces TCO.
Branching and forking: AI agents can be non-deterministic, and “vibes” need to be checked and verified. The ability to instantly create a full copy of a database, not only for schema but also for the data, allows all these AI agents to be operating on their own isolated database instance in high fidelity for experimentation and validation.

Looking Forward

Today, we are also announcing the Public Preview of our new database offering also named Lakebase..

But more important than the product announcement, lakebase is a new OLTP database architecture that is far superior to the traditional database architecture. We believe it is how every OLTP database system should be built in the future.

What's next?

December 12, 2024/4 min read

Making AI More Accessible: Up to 80% Cost Savings with Meta Llama 3.3 on Databricks

Announcing Public Preview of Hive Metastore and AWS Glue Federation in Unity Catalog

March 19, 2025/4 min read