Skip to main content
Platform blog

Databricks Workflows is a fully-managed service on Databricks that makes it easy to build and manage complex data and ML pipelines in your lakehouse without the need to operate complex infrastructure.

Sometimes, a task in an ETL or ML pipeline depends on the output of an upstream task. An example would be to evaluate the performance of a machine learning model and then have a task determine whether to retrain the model based on model metrics. Since these are two separate steps, it would be best to have separate tasks perform the work. Previously, accessing information from a previous task required storing this information outside of the job's context, such as in a Delta table.

Databricks Workflows is introducing a new feature called "Task Values", a simple API for setting and retrieving small values from tasks. Tasks can now output values that can be referenced in subsequent tasks, making it easier to create more expressive workflows. Looking at the history of a job run also provides more context, by showcasing the values passed by tasks at the DAG and task levels. Task values can be set and retrieved through the Databricks Utilities API.

The history of the run shows that the “evaluate_model” task has emitted a value
When clicking on the task, you can see the values emitted by the task

Task values are now generally available. We would love for you to try out this new functionality and tell us how we can improve orchestration even further!

Try Databricks for free

Related posts

Data AI

5 Key Steps to Successfully Migrate From Hadoop to the Lakehouse Architecture

August 6, 2021 by Harsh Narula in Data Strategy
The decision to migrate from Hadoop to a modern cloud-based architecture like the lakehouse architecture is a business decision, not a technology decision...
Engineering blog

100x Faster Bridge between Apache Spark and R with User-Defined Functions on Databricks

August 15, 2018 by Liang Zhang and Hossein Falaki in Engineering Blog
SparkR User-Defined Function (UDF) API opens up opportunities for big data workloads running on Apache Spark to embrace R's rich package ecosystem. Some...
Platform blog

7 Reasons to Migrate From Your Cloud-Based Hadoop to the Databricks Lakehouse Platform

Over the past several years, many enterprises have migrated their legacy on-prem Hadoop workloads to cloud-based managed services like EMR, HDInsight, or DataProc...
See all Product posts