Skip to main content

Databricks Workflows

Unified orchestration for data, analytics and AI on the Data Intelligence Platform

hero image

Databricks Workflows is a managed orchestration service, fully integrated with the Databricks Data Intelligence Platform. Workflows lets you easily define, manage and monitor multitask workflows for ETL, analytics and machine learning pipelines. With a wide range of supported task types, deep observability capabilities and high reliability, your data teams are empowered to better automate and orchestrate any pipeline and become more productive.

yipit

“If we went back to 2018 and Databricks Workflows was available, we would never have considered building out a custom Airflow setup. We would just use Workflows.”

— Hillevi Crognale, Engineering Manager, YipitData
Learn more

graphic 6

Simple Authoring

Whether you’re a data engineer, a data analyst or a data scientist, easily define workflows with just a few clicks or use your favorite IDE.

graphic

Actionable Insights

Get full visibility into each task running in every workflow and get notified immediately on issues that require troubleshooting.

Icon Graphic

Proven Reliability

Having a fully managed orchestration service means having the peace of mind that your production workflows are up and running. With 99.95% uptime, Databricks Workflows is trusted by thousands of organizations.

How does it work?

Workflows Marketecture

Unified with the Databricks Data Intelligence Platform

Reliability in production

Deep monitoring and observability

Batch and streaming

Efficient compute

Seamless user experience

wood mackenzie logo graphic

“Using Databricks Workflows allowed us to encourage collaboration and break up the walls between different stages of the process. It allowed us all to speak the same language.”

— Yanyan Wu, Vice President of Data, Wood Mackenzie
Learn more
workflows

Unified with the Databricks Data Intelligence Platform

Unlike external orchestration tools, Databricks Workflows is fully integrated with the Databricks Data Intelligence Platform. This means you get native workflow authoring in your workspace and the ability to automate any platform capability including Delta Live Tables pipelines, Databricks notebooks and Databricks SQL queries. With Unity Catalog, you get automated data lineage for every workflow so you stay in control of all your data assets across the organization.  

reliability at scale

Reliability at scale

Every day, thousands of organizations trust Databricks Workflows to run millions of production workloads across AWS, Azure and GCP with 99.95% uptime. Having a fully managed orchestration tool built into the Data Intelligence Platform means you don’t need to maintain, update or troubleshoot another separate tool for orchestration.

deep monitoring and observability

Deep monitoring and observability

Full integration with the Data Intelligence Platform means Databricks Workflows provides you with better observability than any external orchestration tool. Stay in control by getting a full view of every workflow run and set notifications for failures to alert your team via email, Slack, PagerDuty or a custom webhook so you can get ahead of issues quickly and troubleshoot before data consumers are impacted.

batch and streaming

Batch and streaming

Databricks Workflows provides you with a single solution to orchestrate tasks in any scenario on the Data Intelligence Platform. Use a scheduled workflow run for recurring jobs that do batch ingestion on preset times or implement real-time data pipelines that run continuously. You can also set a workflow to run when new data is made available using file arrival triggers.

efficient compute

Efficient compute

Orchestrating with Databricks Workflows gives you better price/performance for your automated, production workloads. Get significant cost savings when utilizing automated job clusters that have a lower cost and are only running when a job is scheduled so you don’t pay for unused resources. In addition, shared job clusters let you reuse compute resources for multiple tasks so you can optimize resource utilization.

workflows

Seamless user experience

Define workflows in your preferred environment — easily create workflows right in the Databricks workspace UI or using your favorite IDE. Define tasks that use a version-controlled notebook in a Databricks Repo or in a remote Git repository and adhere to DevOps best practices such as CI/CD.

Ahold Delhaize

“Databricks Workflows allows us to clearly see how every job ran and whether it succeeded or failed. In our previous solution, we had a lot of moving parts, we had multiple triggers and multiple dependent pipelines that triggered each other. With the use of Workflows, there is only one job where we have all that information right in front of us.”

— Charlotte van der Scheun, Tech Lead, Platform Engineering, Ahold Delhaize
Learn more

Integrations

Databricks Workflows provides seamless integration with leading industry partners to provide you the flexibility to define workflows that meet your needs with your data solution of choice

dbt labs logo
Arcion
Matillion
Azure Data Factory logo
apache airflow
fivetran-logo-small.svg
dbt labs logo
Arcion
Matillion
Azure Data Factory logo
apache airflow
fivetran-logo-small.svg

FAQ

Orchestration, in the context of data, analytics and AI, refers to the automation, deployment and management of workflows such as ETL data pipelines and machine learning model training. Orchestration is an important part of data operations and is essential for bringing data solutions to production. Orchestration involves managing the dependencies between workflow tasks and scheduling these tasks to be executed. Orchestration also includes compute resource allocation and monitoring of workflows.

Ready to get started?