Delta Lake

High performance for your open data lakehouse

Data Warehousing in the Era of AI

Build pipelines faster, automatically optimize performance and ask questions with natural language. Join us this May to learn how.

What is Delta Lake?

Delta Lake is the only open format storage layer that can automatically and instantly translate across open formats. Delta Lake unifies all data types for transactional, analytical and AI use cases out of the box, with support for streaming and batch operations. Delta Lake offers industry-leading performance and is the foundation of a cost-effective, highly scalable lakehouse.

Open and vast ecosystem

With Delta Lake Universal Format (UniForm), you will be able to use your favorite Iceberg or Hudi client to read your Delta tables through the Unity Catalog endpoint.

Delta Lake 3.0 simplifies the connector ecosystem. Delta Kernel offers a stable library API, making it easier for connectors to incorporate new Delta features without code changes.

Lightning-fast performance

Delta Lake on Databricks delivers massive scale and speed, with data loads and queries running up to 1.7x faster than with other storage formats.

AI-driven for best price/performance

Delta Lake with Unity Catalog and Photon offers the best price/performance out of the box without manual tuning. The Databricks Lakehouse uses AI models to solve common challenges with data storage, so you get faster performance without having to manually manage tables, even as they change over time.

Predictive I/O for updates optimizes your query plans and data layout for peak performance, intelligently balancing read vs. write performance. Get more from your data without needing to decide between strategies like copy-on-write vs. merge-on-read.

Liquid clustering delivers the performance of a well-tuned, well-partitioned table without the traditional headaches that come with partitioning, such as worrying about whether you can partition high-cardinality columns or expensive rewrites when changing partition columns. The result is lightning-fast, well-clustered tables with minimal configuration.

Predictive optimization automatically optimizes your data for the best performance and price. It learns from your data usage patterns, builds a plan for the right optimizations to perform, and then runs those optimizations on hyper-optimized serverless infrastructure.

Open and secure data sharing

Delta Sharing is the industry’s first open protocol for secure data sharing, making it simple to share data with other organizations regardless of where the data lives. Native integration with the Unity Catalog allows you to centrally manage and audit shared data across organizations. This allows you to confidently share data assets with suppliers and partners for better coordination of your business while meeting security and compliance needs. Integrations with leading tools and platforms allow you to visualize, query, enrich and govern shared data from your tools of choice.

Automated and trusted data engineering

Simplify data engineering with Delta Live Tables — an easy way to build and manage data pipelines for fresh, high-quality data on Delta Lake. It helps data engineering teams by simplifying ETL development and management with declarative pipeline development, improved data reliability and cloud-scale production operations to help build the lakehouse foundation.

Security and governance at scale

Delta Lake reduces risk by enabling fine-grained access controls for data governance, functionality typically not possible with data lakes. You can quickly and accurately update data in your data lake to comply with regulations like GDPR and maintain better data governance through audit logging. These capabilities are natively integrated and enhanced on Databricks as part of the Unity Catalog, the first multicloud data catalog for the lakehouse.

Use Cases

Learn more

Data Ingestion Network

Native connectors easily ingest data into Delta Lake quickly and reliably from all your applications, databases and file storage.

Customers

“Databricks delivered the time to market as well as the analytics and operational uplift that we needed in order to be able to meet the new demands of the healthcare sector.”
– Peter James, Chief Architect, Healthdirect Australia

Learn more

“By leveraging Databricks and Delta Lake, we have already been able to democratize data at scale, while lowering the cost of running production workloads by 60%, saving us millions of dollars.”
— Steve Pulec, Chief Technology Officer, YipitData

Learn more

“Delta Lake provides ACID capabilities that simplify data pipeline operations to increase pipeline reliability and data consistency. At the same time, features like caching and auto-indexing enable efficient and performant access to the data.”
— Lara Minor, Senior Enterprise Data Manager, Columbia Sportswear

Learn more

“Delta Lake has created a streamlined approach to the management of data pipelines. This has led to a decrease in operational costs while speeding up time-to-insight for downstream analytics and data science.”
— Parijat Dey, Assistant Vice President of Digital Transformation and Technology, Viacom18

Learn more