Exciting new updates coming to Workflows in April
Databricks is excited to announce the release of several exciting new Workflows features that will simplify the way you create and launch automated jobs while also adding new capabilities for orchestrating more tasks at the right time. Whether you are an experienced data engineer or just starting out with SQL query automation, these new features are intended to simplify your workflow, boost productivity, and help you achieve your goals more efficiently. In this blog post, we will go over our most recent release and explain how you can use these new features to turbocharge the lakehouse.
File arrival triggers
In addition to scheduling, many customers want to initiate a Workflow when a certain event occurs. For this reason we are introducing a powerful new feature called "file arrival triggers". With this functionality, you can configure jobs to start when files arrive in cloud storage. Using these new triggers allows workflows to ingest data, run machine learning inference or perform any type of analysis immediately when these files have arrived. External locations in the Unity Catalog are used to govern access to the cloud storage to simplify leveraging this functionality.
File arrival triggers are now in public preview on Azure and AWS.
Continuous jobs
Continuous jobs allow you to orchestrate reliable workloads that run 24/7, such as Apache Spark™ structured streaming jobs. With this new capability, you no longer have to configure maximum concurrent runs or choose a special cron schedule, as Workflows will handle scheduling and retries. At Databricks we are obsessed with making Workflows simple to use, so we made configuring a continuous job really easy. All you need to do is click a button in the Triggers menu.
Continuous jobs are now in public preview.
SQL files
Expanding on the number of different task types you can orchestrate with Databricks Workflows, it is now possible to orchestrate external files containing SQL queries defined in files on Databricks SQL warehouses. You could already schedule predefined Databricks SQL queries as well as run alerts, and update dashboards. The new SQL files task allows you to store .sql files in a Git repository. Every time the job runs, the latest version of the file from a specific branch is fetched. This new functionality makes it easy to co-version notebooks and SQL queries together. Using Git for these artifacts improves collaboration between team members and reduces the risk of errors.
SQL files task is now Generally Available.
User interface improvements
Last but certainly not least, we have also improved the Databricks Workflows user interface for creating and editing your jobs. With the goal of making your life as simple as possible, you can now change a task without losing track of the overall job structure and settings. This new interface is designed to streamline the job creation and editing process, making it easier for you to create and manage complex workflows.
Summary
Databricks is constantly working on improving the Workflows product. In this blog post, we have introduced multiple ways to define when your job should be running, a new way of orchestrating version-controlled SQL queries, and a new user interface to simplify creating and editing your workflows.
We cannot wait to see how these features will help you get more out of the lakehouse. What are you most excited about? Try them out in Workflows today!