In this blog, we propose a new architecture for OLTP databases called a lakebase. A lakebase is defined by:
Most technologies have some degree of lock-in, but nothing has more lock-in than traditional OLTP databases. As a result, there has been very little innovation in this space for decades. OLTP databases are monolithic and expensive, with significant vendor lock-in.
At its core, a lakebase is grounded in battle-tested, open source technologies. This ensures compatibility with a broad ecosystem of tools and developer workflows. Unlike proprietary systems, lakebases promote transparency, portability, and community-driven innovation. They give organizations the confidence that their data architecture won’t be locked into a single vendor or platform.
Postgres is the leading open source standard for databases. It is the fastest growing OLTP database on DB-Engines and leads the StackOverflow developer survey as the most popular database by a wide margin. It has a mature engine with a rich ecosystem of extensions.
One of the most fundamental technical pillars of lakehouses is the separation of storage and compute. It enables independent scaling of compute resources and storage resources. Lakebases share the same architecture. This is more challenging to build because low cost data lakes were not initially designed for the stringent workloads OLTP databases run, e.g. single digit millisecond latency and millions of transactions per second throughput.
Note that some earlier attempts at separation of storage and compute have been made by various proprietary databases, such as several hyperscaler Postgres offerings. These are built on proprietary, closed storage systems that are inherently more expensive and do not expose open storage.
Lakebases evolved based on the earlier attempts to leverage low cost data lakes and truly open formats. Data is persisted in object stores in open formats (e.g. Postgres pages), and compute instances read directly from data lakes but leverage intermediate layers with soft state to improve performance.
Traditional databases are heavyweight infrastructure that require a lot of management. Once provisioned, they typically run for years. If overprovisioned, one spends more than they need to. If underprovisioned, the databases won’t have the capacity to scale to the needs of the application and can incur downtime to scale up.
A lakebase is lightweight and serverless. It spins up instantly when needed, and scales down to zero when no longer necessary. It scales itself automatically, as loads change. All of these capabilities are made possible by the separation of storage and compute architecture.
In traditional architectures, operational databases and analytical systems are completely siloed. Moving data between them requires custom ETL pipelines, manual schema management, and separate sets of access controls. This fragmentation slows development, introduces latency, and creates operational overhead for both data and platform teams.
A lakebase solves this with deep integration into the lakehouse, enabling near real-time synchronization between operational and analytical layers. As a result, data becomes available quickly for serving in applications, and operational changes can flow back into the lakehouse without complex workflows, duplicated infrastructure, or egress costs incurred from moving data. Integration with the lakehouse also simplifies governance, with consistent data permissions and security.
Today, virtually every engineer’s first step in modifying a codebase is to create a new git branch of the repository. The engineer can make changes to the branch and test against it, which is fully isolated from the production branch. This workflow breaks down with databases. There is no “git checkout -b” equivalent to traditional databases, and as a result, database changes tend to be one of the most error-prone parts of the software development lifecycle.
Enabled by a copy-on-write technique from the separation of storage and compute architecture, lakebases enable branching of the full database, including both schema and data, for high fidelity development and testing. This new branch is created instantly, and at extremely low cost, so it can be used whenever “git checkout -b” is needed.
Neon’s data show that over the course of the last year, databases created by AI agents increased from 30% to over 80%. This means that AI agents today outcreate human databases by a factor of 4. As the trend continues, in the near future, 99% of databases will be created and operated by AI agents, often with humans in the loop. This will have profound implications on the requirements of database design, and we think lakebases will be best positioned to serve these AI agents.
If you think of AI agents as your own massive team of high-speed junior developers (potentially “mentored” by senior developers), the aforementioned capabilities of lakebases will be tremendously helpful to AI agents:
Today, we are also announcing the Public Preview of our new database offering also named Lakebase..
But more important than the product announcement, lakebase is a new OLTP database architecture that is far superior to the traditional database architecture. We believe it is how every OLTP database system should be built in the future.