Skip to main content

Announcing Lakebase Public Preview

Fully-managed Postgres for data apps and AI agents

Databricks Lakebase: Postgres for data apps and AI agents

Summary

  • Traditional databases are slow and expensive to provision, don’t scale well, are siloed from analytics platforms, and don’t fit into a modern developer workflow.
  • Lakebase is a fully managed Postgres database integrated with the lakehouse and built for AI.
  • Enterprises use Lakebase to serve data and features from the lakehouse, power standalone intelligent applications, and analyze operational data in the lakehouse.

At the Data and AI Summit, we introduced a new category of operational databases called lakebases for building intelligent applications. Today, we’re excited to announce the Public Preview of Databricks Lakebase, the first fully-managed Postgres database built for data apps and AI. 

Customers are combining their operational and analytical data to build intelligent applications: serving features and models, building standalone applications or analyzing operational data in a lakehouse. But they continue to struggle with provisioning, scaling and the lack of a modern developer experience for data, because databases haven’t seen much innovation in the last decades. 

Lakebases provide a solution for the AI era. In this blog, we’ll introduce the key Databricks Lakebase features and benefits, and outline how customers already use Lakebase today.

Introducing Lakebase

OLTP databases have not fundamentally changed since the 90s. Even when deployed on the cloud, these legacy databases are slow and expensive to provision and manage. Operational databases are typically deployed in a separate stack from the analytics platform, creating silos between transactional and analytical data. Moreover, these databases also do not fit into a modern development workflow needed for AI development. The traditional architecture typically involves separate databases for development, testing, staging, and production environments - each provisioned, populated, and maintained separately.

Databricks Lakebase is a first-of-its-kind database built on open source standards, with a highly scalable architecture, based on the separation of compute and storage,  specifically designed for modern application development. Lakebase is deeply integrated into the lakehouse to make it easier to combine operational, analytical, and AI stacks.

Built on open source Postgres 

Over the last 7 years, Postgres has become the most popular database in the developer community and is the de facto database choice for modern applications. It’s open source, has a vibrant ecosystem of extensions, and is supported by a robust community of libraries, tools, and frameworks. Engineers already know how to work with it, and all foundational models are trained on vast amounts of data available for the Postgres ecosystem, making it very accessible to intelligent applications and agents.

With support for popular extensions such as PostGIS and pgvector, and a broad ecosystem of drivers and tools, Lakebase provides a rich set of capabilities that will be familiar to development teams. 

Separation of Compute and Storage

Lakebase leverages an architecture that separates compute and storage, which allows independent scaling while supporting low latency (<10ms) and high concurrency transactions (>10k qps).

Lakebase is fully managed by Databricks, which means there’s no infrastructure to provision or maintain. The result is a database service that removes friction from both infrastructure and development processes, allowing teams to move faster without compromising control or reliability.

  • High availability with readable secondaries: Multi-zone high availability protects against zonal failure by provisioning secondary compute resources across zones. Secondaries can optionally be readable to provide isolation and horizontal scaling of read workloads.
  • Data storage and recovery: All transactions are persisted to encrypted storage that is regionally durable and so protected against any single zone failure. Point-in-time recovery is available via a data protection window that provides up to 35 days of recovery time.
  • Branching for an isolated test environment or point in time recovery: Lakebase uses copy-on-write branching to create an instant zero-copy clone of the database, together with dedicated compute to operate on that branch. The child branch is managed independently of the main parent branch, and can be created based on the data in the parent at the current point in time, or at a previous point in time or Log Sequence Number (LSN). This can be used to create an isolated test environment with production data or for point-in-time recovery operations.

Modern DevEx, Built for AI

Lakebase is built on Neon technology, which provides copy-on-write branching and autoscaling serverless compute. Copy-on-write branching makes it possible to instantly create a new database with the same data and schema as an existing database, without affecting the original. This new database is economically friendly because it does not duplicate the underlying data.  Serverless compute autoscaling provides for sub-second start times, and scales based on demand, with scaling to zero allowing for cost-effective compute utilization.

Combined, serverless autoscaling of compute and branching capabilities completely change the development paradigm for applications. Developers can instantly create a database branch to match each git branch and don’t have to worry about standing up new database instances, sampling data for dev or testing environments, or hydrating multiple databases.

For developers and agents alike, this means that ephemeral database environments can be rapidly created, used, and decommissioned at virtually no cost, with virtually no effort.

The full Neon developer experience in Lakebase and many more exciting features will be coming soon.

Integrated with the lakehouse

Lakebase integrates a transactional database layer with the lakehouse and inherits the operational maturity of the Databricks Platform, including observability, security, and access controls. Lakebase syncs with Unity Catalog managed tables, making it fast and easy to combine operational, analytical, and AI workloads without custom ETL pipelines. As a result, you can build intelligent applications that consume features or predictions generated in the lakehouse and update the analytical layer with fresh operational data, all within a unified platform.

  • Fully managed data synchronization: Easy-to-configure data synchronization pipelines provide a simple, scalable way to manage data between Unity Catalog managed tables and Lakebase. Data synchronization frequency options include one-off Snapshot, Triggered or Continuous.
  • Feature and Model serving: Serve machine learning features and models for applications with Lakebase as the online feature store, and the lakehouse as the offline store for training and analysis.
  • Unified governance: Take advantage of native integration with Unity Catalog and Databricks identity to simplify access control across the platform. Leverage Databricks Identity and OAuth to maintain a consistent identity across your operational and analytical users. Register a Postgres database in Unity Catalog to provide unified governance and access control for analytics users.
  • Databricks Apps integration: Build and deploy full-stack applications on Databricks with Lakebase powering transactional interactions. Databricks Apps support Lakebase as a native resource type.
  • Unified development environment: Use the Databricks SQL Editor to directly query Lakebase as well as browse data.
  • Built-in monitoring: Provides key database metrics such as transactions per second, the number of open connections, and resource utilization.
  • Network security: Lakebase is integrated with Databricks’ enterprise network security features, including PrivateLink and IP ACLs, to provide a consistent network security 
  • Multi-cloud: Lakebase is available across cloud providers without replatforming. At Public Preview, Lakebase is available on Azure and AWS, with support for Google Cloud Platform to be added in the future.

Customers are using Lakebase

With hundreds of customers in the Private Preview program, it has been exciting to see the variety of use cases, including:

  • Serving data and/or features from the lakehouse for applications like personalized recommendations, or customer segmentation,
  • Building applications and agents for order processing, interactive workflow sign-off and chatbots.
  • Analyze operational data in the lakehouse by syncing data to the lakehouse for historical order analysis, or chatbot history for training data.

Wall of logos with customers

At Heineken, our goal is to become the best-connected brewer. To do that, we needed a way to unify all of our datasets to accelerate the path from data to value. Databricks has long been our foundation for analytics, creating insights such as product recommendations and supply chain enhancements. Our analytical data platform is now evolving to be an operational AI data platform and needs to deliver those insights to applications at low latency.
—Jelle Van Etten, Head of Global Data Platform, Heineken
At Tibber, empowering customers to take control of their energy consumption requires a flexible data infrastructure. Lakebase’s integration with Databricks makes it easy to serve analytical and transactional data helping us deliver real-time insights to our customers.
— Niklas Nordansjö, Data Platform Lead, Tibber AS

A strong partner network helps Lakebase customers work with their existing technology partners and System Integrators for data integration, business intelligence, and governance. We're excited to have an amazing group of industry launch partners for Lakebase.

At dbt Labs, we're changing how data engineering is done. With Databricks' new Lakebase, our joint customers will now be able to combine low-latency, transactional data and analytical data into one platform on Databricks. This will help us both deliver enterprise-scale AI for our customers. We can't wait to usher in the new era of analytics with Databricks.
— Ryan Segar, Chief Product Officer, dbt Labs

Summary

Lakebase combines the familiarity and extensibility of Postgres, the scalability of a modern serverless architecture, a modern developer experience, with the unified data experience of the lakehouse and operational maturity of the Databricks Data Intelligence Platform. By combining these elements into a single, fully managed offering, Lakebase enables teams to build intelligent, data-driven applications without the operational complexity traditionally associated with transactional systems.

Lakebase is available in Public Preview with pricing available here. If you're looking to build applications that incorporate analytics and AI, it's the missing piece of your stack, ready to accelerate development and simplify operations. If you are a Workspace or Account administrator, you can enable it directly from your Databricks Account. Try it out today!

Never miss a Databricks post

Subscribe to the categories you care about and get the latest posts delivered to your inbox