Best Practices: Kicking off Databricks Workflows Natively in Azure Data Factory

Azure Data Factory customers can now get the most out of the Databricks Data Intelligence Platform by using Databricks Workflows

Published: May 16, 2025

by Leo Furlong and Prashanth Babu Velanati Venkata

Summary

The Databricks Job activity in Azure Data Factory is the recommended method for orchestrating Jobs in Databricks.
This integration provides immediate business value and cost savings through providing access to the entire Data Intelligence Platform.
Users with ETL frameworks using Notebook activities should migrate to Databricks Workflows and the ADF Databricks Job activity.

Azure Databricks is a first-party Microsoft service, natively integrated with the Azure ecosystem to unify data and AI with high-performance analytics and deep tooling support. This tight integration now includes a native Databricks Job activity in Azure Data Factory (ADF), making it easier than ever to trigger Databricks Workflows directly within ADF.

This new activity in ADF is an immediate best practice, and all ADF and Azure Databricks users should consider moving to this pattern.

The new Databricks Job activity is very simple to use:

In your ADF pipeline, drag the Databricks Job activity onto the screen
On the Azure Databricks tab, select a Databricks linked service for authentication to the Azure Databricks workspace
- You can authenticate using one of these options:
  - a PAT token
  - the ADF system assigned managed identity, or
  - a user assigned managed identity
- Although the linked service requires you to configure a cluster, this cluster is neither created nor used when executing this activity. It is retained for compatibility with other activity types

jobs activity

3. On the settings tab, select a Databricks Workflow to execute in the Job drop down list (you’ll only see the Jobs your authenticated principal has access to). In the Job Parameters section below, configure Job Parameters (if any) to send to the Databricks Workflow. To know more about Databricks Job Parameters, please check the docs.

Note that the Job and Job Parameters can be configured with dynamic content

job parameter

That’s all there is to it. ADF will kick off your Databricks Workflow and give back the Job Run ID and URL. ADF will then poll for the Job Run to complete. Read more below to learn why this new pattern is an instant classic.

gif pbi

Kicking off Databricks Workflows from ADF lets you get more horsepower out of your Azure Databricks investment

Using Azure Data Factory and Azure Databricks together has been a GA pattern since 2018 when it was released with this blog post. Since then, the integration has been a staple for Azure customers who have primarily been following this simple pattern:

Use ADF to land data into Azure storage via its 100+ connectors using a self-hosted integration runtime for private or on-premise connections
Orchestrate Databricks Notebooks via the native Databricks Notebook activity to implement scalable data transformation in Databricks using Delta Lake tables in ADLS

While this pattern has been extremely valuable over time, it has constrained customers into the following modes of operation, which rob them of the full value of Databricks:

Using All Purpose compute to run Jobs to prevent cluster launch times -> run into noisy neighbor problems and paying for All purpose compute for automated jobs
Waiting for cluster launches per Notebook execution when using Jobs compute -> classic clusters are spun up per notebook execution, incurring cluster launch time for each, even for a DAG of notebooks
Managing Pools to reduce Job cluster launch times -> pools can be hard to manage and can often lead to paying for VMs that aren’t being utilized
Using an overly permissive permissions pattern for integration between ADF and Azure Databricks -> the integration requires workspace admin OR the create cluster entitlement
No ability to use new features in Databricks like Databricks SQL, DLT, or Serverless

While this pattern is scalable and native to Azure Data Factory and Azure Databricks, the tooling and capabilities it offers have remained the same since its launch in 2018, even though Databricks has grown leaps and bounds into the market-leading Data Intelligence Platform across all clouds.

Azure Databricks goes beyond traditional analytics to deliver a unified Data Intelligence Platform on Azure. It combines industry-leading Lakehouse architecture with built-in AI and advanced governance to help customers unlock insights faster, at lower cost, and with enterprise-grade security. Key capabilities include:

OSS and Open standards
An industry leading Lakehouse Catalog through Unity Catalog for securing data and AI across code, languages, and compute inside and outside of Azure Databricks
Best-in-class performance and price performance for ETL
Built-in capabilities for traditional ML and GenAI, including fine-tuning LLMs, using foundational models (including Claude Sonnet), building Agent applications, and serving models
Best-in-class DW on the lakehouse with Databricks SQL
Automated publishing and integration with Power BI through the Publish to Power BI functionality found in Unity Catalog and Workflows

With the release of the native Databricks Job activity in Azure Data Factory, customers can now execute Databricks Workflows and pass parameters to the Jobs Runs. This new pattern not only solves for the constraints highlighted above, but it also allows for the usage of the following features in Databricks that were not previously available in ADF like:

Programming a DAG of Tasks inside Databricks
Using Databricks SQL integrations
Executing DLT pipelines
Using dbt integration with a SQL Warehouse
Using Classic Job Cluster reuse to reduce cluster launch times
Using Serverless Jobs compute
Standard Databricks Workflow functionality like Run As, Task Values, Conditional Executions like If/Else and For Each, AI/BI Task, Repair Runs, Notifications/Alerts, Git integration, DABs support, built-in lineage, queuing and concurrent runs, and much more...

Most importantly, customers can now use the ADF Databricks Job activity to leverage the Publish to Power BI Tasks in Databricks Workflows, which will automatically publish Semantic Models to the Power BI Service from schemas in Unity Catalog and trigger an Import if there are tables with storage modes using Import or Dual (set up instructions documentation). A demo on Power BI Tasks in Databricks Workflows can be found here. To complement this, check out the Power BI on Databricks Best Practices Cheat Sheet – a concise, actionable guide that helps teams configure and optimize their reports for performance, cost, and user experience from the start.

pbi task

publish to pbi task
The Databricks Job activity in ADF is the New Best Practice

Using the Databricks Job activity in Azure Data Factory to kick off Databricks Workflows is the new best practice integration when using the two tools. Customers can immediately start using this pattern to take advantage of all of the capabilities in the Databricks Data Intelligence Platform. For customers using ADF, using the ADF Databricks Job activity will result in immediate business value and cost savings. Customers with ETL frameworks that are using Notebook activities should migrate their frameworks to use Databricks Workflows and the new ADF Databricks Job activity and prioritize this initiative in their roadmap.

Get Started with a Free 14-day Trial of Azure Databricks.

What's next?

August 19, 2024/2 min read

Announcing the PyCharm Integration with Databricks

September 19, 2024/4 min read

Summary

Kicking off Databricks Workflows from ADF lets you get more horsepower out of your Azure Databricks investment

Never miss a Databricks post

Sign up

What's next?

Announcing the PyCharm Integration with Databricks

Fine-tuning Llama 3.1 with Long Sequences