What is Delta Lake?
Delta Lake is an open format storage layer that delivers reliability, security and performance on your data lake — for both streaming and batch operations. By replacing data silos with a single home for structured, semi-structured and unstructured data, Delta Lake is the foundation of a cost-effective, highly scalable lakehouse.
High-quality, reliable data
Deliver a reliable single source of truth for all of your data, including real-time streams, so your data teams are always working with the most current data. With support for ACID transactions and schema enforcement, Delta Lake provides the reliability that traditional data lakes lack. This enables you to scale reliable data insights throughout the organization and run analytics and other data projects directly on your data lake — for up to 50x faster time-to-insight.
Open and secure data sharing
Delta Sharing is the industry’s first open protocol for secure data sharing, making it simple to share data with other organizations regardless of where the data lives. Native integration with the Unity Catalog allows you to centrally manage and audit shared data across organizations. This allows you to confidently share data assets with suppliers and partners for better coordination of your business while meeting security and compliance needs. Integrations with leading tools and platforms allow you to visualize, query, enrich, and govern shared data from your tools of choice.
With Apache Spark™ under the hood, Delta Lake delivers massive scale and speed. And because it’s optimized with performance features like indexing, Delta Lake customers have seen ETL workloads execute up to 48x faster.
Open and agile
All data in Delta Lake is stored in open Apache Parquet format, allowing data to be read by any compatible reader. APIs are open and compatible with Apache Spark. With Delta Lake on Databricks, you have access to a vast open source ecosystem and avoid data lock-in from proprietary formats.
Automated and trusted data engineering
Simplify data engineering with Delta Live Tables – an easy way to build and manage data pipelines for fresh, high-quality data on Delta Lake. It helps data engineering teams by simplifying ETL development and management with declarative pipeline development, improved data reliability and cloud-scale production operations to help build the lakehouse foundation.
Security and governance at scale
Delta Lake reduces risk by enabling fine-grained access controls for data governance, functionality typically not possible with data lakes. You can quickly and accurately update data in your data lake to comply with regulations like GDPR and maintain better data governance through audit logging. These capabilities are natively integrated and enhanced on Databricks as part of the Unity Catalog, the first multi-cloud data catalog for the Lakehouse.
BI on your data
Make new, real-time data instantly available for querying by data analysts for immediate insights on your business by running business intelligence workloads directly on your data lake. Delta Lake allows you to operate a multicloud lakehouse architecture that provides data warehousing performance at data lake economics for up to 6x better price/performance for SQL workloads than traditional cloud data warehouses.
Unify batch and streaming
Run both batch and streaming operations on one simplified architecture that avoids complex, redundant systems and operational challenges. In Delta Lake, a table is both a batch table and a streaming source and sink. Streaming data ingest, batch historic backfill and interactive queries all work out of the box and directly integrate with Spark Structured Streaming.
Meet regulatory needs
Delta Lake removes the malformed data ingestion challenges, difficulty deleting data for compliance, and issues modifying data for change data capture. With support for ACID transactions on your data lake, Delta Lake ensures that every operation either fully succeeds or fully aborts for later retries — without requiring new data pipelines to be created. Additionally, Delta Lake records all past transactions on your data lake, so it’s easy to access and use previous versions of your data to meet compliance standards like GDPR and CCPA reliably.
Data Ingestion Network
Native connectors easily ingest data into Delta Lake quickly and reliably from all your applications, databases and file storage.
“Databricks delivered the time to market as well as the analytics and operational uplift that we needed in order to be able to meet the new demands of the healthcare sector.”
– Peter James, Chief Architect, Healthdirect Australia
“By leveraging Databricks and Delta Lake, we have already been able to democratize data at scale, while lowering the cost of running production workloads by 60%, saving us millions of dollars.”
— Steve Pulec, Chief Technology Officer, YipitData
“Delta Lake provides ACID capabilities that simplify data pipeline operations to increase pipeline reliability and data consistency. At the same time, features like caching and auto-indexing enable efficient and performant access to the data.”
— Lara Minor, Senior Enterprise Data Manager, Columbia Sportswear
“Delta Lake has created a streamlined approach to the management of data pipelines. This has led to a decrease in operational costs while speeding up time-to-insight for downstream analytics and data science.”
— Parijat Dey, Assistant Vice President of Digital Transformation and Technology, Viacom18