APACHE SPARK™ DECLARATIVE PIPELINES

Reliable data pipelines made easy

Simplify batch and streaming ETL with automated reliability and built-in data quality.

Take a Product Tour Explore the Documentation

TOP TEAMS SUCCEED WITH INTELLIGENT DATA PIPELINES

Learn how to build ETL pipelines with SQL

Build batch and real-time ETL pipelines using SQL. No data engineering support needed.

Read now

BENEFITS

Data pipeline best practices, codified

Simply declare the data transformations you need — let Spark Declarative Pipelines handle the rest.

Efficient ingestion

Building production-ready data pipelines starts with ingestion. Spark Declarative Pipelines enables efficient ingestion for data engineers, Python developers, data scientists and SQL analysts. Load data from any Apache Spark™–supported source on the Databricks Platform, whether batch, streaming or CDC.

Intelligent transformation

From just a few lines of code, Spark Declarative Pipelines determines the most efficient way to build and execute your batch or streaming data pipelines, automatically optimizing for cost or performance while minimizing complexity.

Automated operations

Spark Declarative Pipelines simplifies pipeline development by codifying best practices out of the box, automating dependency management, scaling and recovery, data quality rules and more. With Spark Declarative Pipelines, engineers can focus on delivering high-quality data rather than operating and maintaining pipeline infrastructure.

FEATURES

Built to simplify data pipelining

Building and operating data pipelines can be hard — but it doesn’t have to be. Spark Declarative Pipelines is built for powerful simplicity, so you can perform robust ETL with just a few lines of code.

Use Genie Code to automate ETL workloads, optimize queries and build pipelines through natural conversation.

Learn more

Leveraging Spark’s unified API for batch and stream processing, Spark Declarative Pipelines allows you to easily toggle between processing modes.

Learn more

Spark Declarative Pipelines makes it easy to optimize pipeline performance by declaring an entire incremental data pipeline with streaming tables and materialized views.

Learn more

Spark Declarative Pipelines supports a broad ecosystem of sources and sinks. Load data from any source — including cloud storage, message buses, change data feeds, databases and enterprise apps.

Learn more

Expectations allow you to guarantee data arriving in tables meets data quality requirements and provides insights into data quality with each pipeline update.

Learn more

Develop pipelines in the IDE for data engineering without any context switching. See the DAG, data preview and execution insights in one UI. Develop code easily with autocomplete, in-line errors and diagnostics.

Learn more

More features

Unified Governance and Storage

Built on the foundational lakehouse standards of Unity Catalog and open table formats.

Learn more

Serverless Compute

Up to 5x better price/performance for data ingestion and 98% cost savings for complex transformations.

Learn more

Task Orchestration

Instead of manually defining a series of separate Apache Spark™ tasks, you define the transformations, and Spark Declarative Pipelines ensures they're executed in the correct sequence.

Learn more

Error Handling and Failure Recovery

Seamless recovery from errors that occur during the execution of data pipelines.

Learn more

CI/CD and Version Control

Easily specify configurations to isolate pipelines in developing, testing and production environments.

Learn more

Pipeline Monitoring and Observability

Built-in monitoring and observability features, including data lineage, update history and data quality reporting.

Learn more

Flexible Refresh Scheduling

Easily optimize for latency or cost depending on your pipeline’s requirements.

Learn more

USE CASES

Streamline your data pipelines

Make sources, transformations and destinations simple

Declarative programming means you get to harness the power of ETL on the Databricks Platform with just a few lines of code.

Get started

Explore Spark Declarative Pipelines demos

See all demos

VIDEO

Lakeflow in Action: Gourmet Pipeline Demo

demo center lakeflow declarative pipelines

TECHNICAL GUIDE

Getting Started With Spark Declarative Pipelines

PRODUCT TOUR

Spark Declarative Pipelines Product Tour

Building a Data Application with LakeFlow

VIDEO

Get to Know Genie Code

PRICING

Usage-based pricing keeps spending in check

Only pay for the products you use at per-second granularity.

Explore pricing

Discover more

Explore other integrated, intelligent offerings on the Databricks Platform.

Lakeflow Connect

Efficient data ingestion connectors from any source and native integration with the Databricks Platform unlock easy access to analytics and AI, with unified governance.

Lakeflow Jobs

Easily define, manage and monitor multitask workflows for ETL, analytics and machine learning pipelines. With a wide range of supported task types, deep observability capabilities and high reliability, your data teams are empowered to better automate and orchestrate any pipeline and become more productive.

Genie Code

Your autonomous AI partner for data work.

Lakehouse Storage

Unify the data in your lakehouse, across all formats and types, for all your analytics and AI workloads.

Unity Catalog

Seamlessly govern all your data assets with the industry’s only unified and open governance solution for data and AI, built into the Databricks Platform.

The Databricks Platform

Find out how the Databricks Platform enables your data and AI workloads.

Take the next step

Explore the Spark Declarative Pipelines docs

Everything you need to get started using Spark Declarative Pipelines on the AWS, Microsoft Azure or Google Cloud Platform environments.

Start a free trial

Test-drive the full Databricks Platform for free.

Spark Declarative Pipelines FAQ

Ready to become a data + AI company?

Take the first steps in your transformation

Try for free Contact Sales

Reliable data pipelines made easy

Data pipeline best practices, codified

Efficient ingestion

Intelligent transformation

Automated operations

Built to simplify data pipelining

More features

Unified Governance and Storage

Serverless Compute

Task Orchestration

Error Handling and Failure Recovery

CI/CD and Version Control

Pipeline Monitoring and Observability

Flexible Refresh Scheduling

Streamline your data pipelines

Make sources, transformations and destinations simple

Easily ensure data integrity and consistency

Unlock powerful real-time use cases without extra tooling

Seamlessly bring data engineering best practices to the world of data warehousing

Explore Spark Declarative Pipelines demos

Usage-based pricing keeps spending in check

Discover more

Lakeflow Connect

Lakeflow Jobs

Genie Code

Lakehouse Storage

Unity Catalog

The Databricks Platform

Take the next step

Explore the Spark Declarative Pipelines docs

Start a free trial

Related content

Spark Declarative Pipelines FAQ