Azure Data Factory customers can now trigger Databricks Workflows using the new native Databricks Job activity, unlocking deeper integration between the two platforms. This best practice helps customers fully leverage the Databricks Data Intelligence Platform, including advanced features like Databricks SQL, DLT, and Power BI publishing. By migrating from Notebook activities to Workflows, customers can improve performance, reduce costs, and simplify operations across their data and AI pipelines.
Azure Databricks is a first-party Microsoft service, natively integrated with the Azure ecosystem to unify data and AI with high-performance analytics and deep tooling support. This tight integration now includes a native Databricks Job activity in Azure Data Factory (ADF), making it easier than ever to trigger Databricks Workflows directly within ADF.
This new activity in ADF is an immediate best practice, and all ADF and Azure Databricks users should consider moving to this pattern.
The new Databricks Job activity is very simple to use:
3. On the settings tab, select a Databricks Workflow to execute in the Job drop down list (you’ll only see the Jobs your authenticated principal has access to). In the Job Parameters section below, configure Job Parameters (if any) to send to the Databricks Workflow. To know more about Databricks Job Parameters, please check the docs.
That’s all there is to it. ADF will kick off your Databricks Workflow and give back the Job Run ID and URL. ADF will then poll for the Job Run to complete. Read more below to learn why this new pattern is an instant classic.
Using Azure Data Factory and Azure Databricks together has been a GA pattern since 2018 when it was released with this blog post. Since then, the integration has been a staple for Azure customers who have primarily been following this simple pattern:
While this pattern has been extremely valuable over time, it has constrained customers into the following modes of operation, which rob them of the full value of Databricks:
While this pattern is scalable and native to Azure Data Factory and Azure Databricks, the tooling and capabilities it offers have remained the same since its launch in 2018, even though Databricks has grown leaps and bounds into the market-leading Data Intelligence Platform across all clouds.
Azure Databricks goes beyond traditional analytics to deliver a unified Data Intelligence Platform on Azure. It combines industry-leading Lakehouse architecture with built-in AI and advanced governance to help customers unlock insights faster, at lower cost, and with enterprise-grade security. Key capabilities include:
With the release of the native Databricks Job activity in Azure Data Factory, customers can now execute Databricks Workflows and pass parameters to the Jobs Runs. This new pattern not only solves for the constraints highlighted above, but it also allows for the usage of the following features in Databricks that were not previously available in ADF like:
Most importantly, customers can now use the ADF Databricks Job activity to leverage the Publish to Power BI Tasks in Databricks Workflows, which will automatically publish Semantic Models to the Power BI Service from schemas in Unity Catalog and trigger an Import if there are tables with storage modes using Import or Dual (set up instructions documentation). A demo on Power BI Tasks in Databricks Workflows can be found here. To complement this, check out the Power BI on Databricks Best Practices Cheat Sheet – a concise, actionable guide that helps teams configure and optimize their reports for performance, cost, and user experience from the start.
The Databricks Job activity in ADF is the New Best Practice
Using the Databricks Job activity in Azure Data Factory to kick off Databricks Workflows is the new best practice integration when using the two tools. Customers can immediately start using this pattern to take advantage of all of the capabilities in the Databricks Data Intelligence Platform. For customers using ADF, using the ADF Databricks Job activity will result in immediate business value and cost savings. Customers with ETL frameworks that are using Notebook activities should migrate their frameworks to use Databricks Workflows and the new ADF Databricks Job activity and prioritize this initiative in their roadmap.
Get Started with a Free 14-day Trial of Azure Databricks.