Delta Live Tables (DLT) is a declarative ETL framework for the Databricks Data Intelligence Platform that helps data teams simplify streaming and batch ETL cost-effectively. Simply define the transformations to perform on your data and let DLT pipelines automatically manage task orchestration, cluster management, monitoring, data quality and error handling.
Efficient data ingestion
Building production-ready ETL pipelines begins with ingestion. DLT powers easy, efficient ingestion for your entire team — from data engineers and Python developers to data scientists and SQL analysts. With DLT, load data from any data source supported by Apache Spark™ on Databricks.
- Use Auto Loader and streaming tables to incrementally land data into the Bronze layer for DLT pipelines or Databricks SQL queries
- Ingest from cloud storage, message buses and external systems
- Use change data capture (CDC) in DLT to update tables based on changes in source data
“I love Delta Live Tables because it goes beyond the capabilities of Auto Loader to make it even easier to read files. My jaw dropped when we were able to set up a streaming pipeline in 45 minutes.”
— Kahveh Saramout, Senior Data Engineer, Labelbox
Intelligent, cost-effective data transformation
From just a few lines of code, DLT determines the most efficient way to build and execute your streaming or batch data pipelines, optimizing for price/performance (nearly 4x Databricks baseline) while minimizing complexity.
- Instantly implement a streamlined medallion architecture with streaming tables and materialized views
- Optimize data quality for maximum business value with features like expectations
- Refresh pipelines in continuous or triggered mode to fit your data freshness needs
“Delta Live Tables has helped our teams save time and effort in managing data at the multitrillion-record scale and continuously improves our AI engineering capability . . . Databricks is disrupting the ETL and data warehouse markets.”
— Dan Jeavons, General Manager Data Science, Shell
Simple pipeline setup and maintenance
DLT pipelines simplify ETL development by automating away virtually all the inherent operational complexity. With DLT pipelines, engineers can focus on delivering high-quality data rather than operating and maintaining pipelines. DLT automatically handles:
- Task orchestration
- CI/CD and version control
- Autoscaling compute infrastructure for cost savings
- Monitoring via metrics in the event log
- Error handling and failure recovery
“Complex architectures, such as dynamic schema management and stateful/stateless transformations, were challenging to implement with a classic multicloud data warehouse architecture. Both data scientists and data engineers can now perform such changes using scalable Delta Live Tables with no barriers to entry.”
— Sai Ravuru, Senior Manager of Data Science and Analytics, JetBlue
Next-gen stream processing engine
Spark Structured Streaming is the core technology that unlocks streaming DLT pipelines, providing a unified API for batch and stream processing. DLT pipelines leverage the inherent subsecond latency of Spark Structured Streaming, and record-breaking price/performance. Although you can manually build your own performant streaming pipelines with Spark Structured Streaming, DLT pipelines may provide faster time to value, better ongoing development velocity, and lower TCO because of the operational overhead they automatically manage.
“We didn’t have to do anything to get DLT to scale. We give the system more data, and it copes. Out of the box, it’s given us the confidence that it will handle whatever we throw at it.”
— Dr. Chris Inkpen, Global Solutions Architect, Honeywell
Delta Live Tables pipelines vs. “build your own” Spark Structured Streaming pipelines
Spark Structured Streaming pipelines | DLT pipelines | ||
---|---|---|---|
Run on the Databricks Data Intelligence Platform | |||
Powered by Spark Structured Streaming engine | |||
Unity Catalog integration | |||
Orchestrate with Databricks Workflows | |||
Ingest from dozens of sources — from cloud storage to message buses | |||
Dataflow orchestration | Manual | Automated | |
Data quality checks and assurance | Manual | Automated | |
Error handling and failure recovery | Manual | Automated | |
CI/CD and version control | Manual | Automated | |
Compute autoscaling | Basic |
Unified data governance and storage
Running DLT pipelines on Databricks means you benefit from the foundational components of the Data Intelligence Platform built on lakehouse architecture — Unity Catalog and Delta Lake. Your raw data is optimized with Delta Lake, the only open source storage framework designed from the ground up for both streaming and batch data. Unity Catalog gives you fine-grained, integrated governance for all your data and AI assets with one consistent model to discover, access and share data across clouds. Unity Catalog also provides native support for Delta Sharing, the industry’s first open protocol for simple and secure data sharing with other organizations.
“We are incredibly excited about the integration of Delta Live Tables with Unity Catalog. This integration will help us streamline and automate data governance for our DLT pipelines, helping us meet our sensitive data and security requirements as we ingest millions of events in real time. This opens up a world of potential and enhancements for our business use cases related to risk modeling and fraud detection.”
— Yue Zhang, Staff Software Engineer, Block