Lakeflow Declarative Pipelines is now Generally Available, and momentum hasn’t slowed since DAIS. This post rounds up everything that’s landed in the past few weeks - so you’re fully caught up on what’s here, what’s coming next, and how to start using it.
At Data + AI Summit 2025, we announced that we’ve contributed our core declarative pipeline technology to the Apache Spark™ project as Spark Declarative Pipelines. This contribution extends Spark’s declarative model from individual queries to full pipelines, letting developers define what their pipelines should do while Spark handles how to do it. Already proven across thousands of production workloads, it’s now an open standard for the entire Spark community.
We also announced the General Availability of Lakeflow, Databricks’ unified solution for data ingestion, transformation, and orchestration on the Data Intelligence Platform. The GA milestone also marked a major evolution for pipeline development. DLT is now Lakeflow Declarative Pipelines, with the same core benefits and full backward compatibility with your existing pipelines. We also introduced Lakeflow Declarative Pipelines’ new IDE for data engineering (shown above), built from the ground up to streamline pipeline development with features like code-DAG pairing, contextual previews, and AI-assisted authoring.
Finally, we announced Lakeflow Designer, a no-code experience for building data pipelines. It makes ETL accessible to more users - without compromising on production readiness or governance - by generating real Lakeflow pipelines under the hood. Preview coming soon.
Together, these announcements represent a new chapter in data engineering—simpler, more scalable, and more open. And in the weeks since DAIS, we’ve kept the momentum going.
We’ve made significant backend improvements to help Lakeflow Declarative Pipelines run faster and more cost-effectively. Across the board, serverless pipelines now deliver better price-performance thanks to engine enhancements to Photon, Enzyme, autoscaling, and advanced features like AutoCDC and Data Quality expectations.
Here are the key takeaways:
These changes build on our ongoing commitment to make Lakeflow Declarative Pipelines the most efficient option for production ETL at scale.
Since the Data + AI Summit, we’ve delivered a series of updates that make pipelines more modular, production-ready, and easier to operate—without requiring additional configuration or glue code.
Managing table health is now easier and more cost-effective:
New capabilities give teams greater flexibility in how they structure and manage pipelines, all without any data reprocessing:
After running the command and moving the table definition from the source to the destination pipeline, the destination pipeline takes over updates for the table.
A new pipeline system table is now in Public Preview, giving you a complete, queryable view of all pipelines across your workspace. It includes metadata like creator, tags, and lifecycle events (like deletions or config changes), and can be joined with billing logs for cost attribution and reporting. This is especially useful for teams managing many pipelines and looking to track cost across environments or business units.
A second system table for pipeline updates - covering refresh history, performance, and failures - is planned for later this summer.
New to Lakeflow or looking to deepen your skills? We’ve launched three free self-paced training courses to help you get started:
All three courses are available now at no cost in Databricks Academy.