Orchestration

What is Orchestration?

Orchestration is the coordination and management of multiple computer systems, applications and/or services, stringing together multiple tasks in order to execute a larger workflow or process. These processes can consist of multiple tasks that are automated and can involve multiple systems.

The goal of orchestration is to streamline and optimize the execution of frequent, repeatable processes and thus to help data teams more easily manage complex tasks and workflows. Anytime a process is repeatable, and its tasks can be automated, orchestration can be used to save time, increase efficiency, and eliminate redundancies. For example, you can simplify data and machine learning with jobs orchestration.

Here’s more to explore

Big Book of Data Engineering

Fast-track your expertise with this essential guide for the AI era.

Read now

O’Reilly technical guide about ETL pipelines

Get started with ETL

Learn about ETL pipelines with this O’Reilly technical guide.

Download now

Promotional graphic for Eckerson Group's white paper on 'Data Pipeline Orchestration: Modernizing Workflows with AI/ML.' The graphic shows a modern building facade with geometric glass patterns in the background, with the white paper title in the foreground. A red circular badge on the right side contains the text 'LEARN THE WHAT, WHY AND HOW' in white letters.

Eckerson Report: Data Pipeline Orchestration

Learn how orchestration is critical for your data and AI efforts and how to choose the right tool.

Read now

What is the difference between process orchestration and process automation?

While automation and orchestration are highly complementary, they mean different things. Automation is programming a task to be executed without the need for human intervention. Orchestration is the configuration of multiple tasks (some may be automated) into one complete end-to-end process or job. Orchestration software also needs to react to events or activities throughout the process and make decisions based on outputs from one automated task to determine and coordinate the next tasks.

What is application orchestration?

Application orchestration is when you integrate two or more software applications together. You might do this in order to automate a process, or to enable real-time syncing of data. Most software development efforts need some kind of application orchestration—without it, you’ll find it much harder to scale application development, data analytics, machine learning and AI projects.

The process allows you to manage and monitor your integrations centrally, and add capabilities for message routing, security, transformation and reliability. This approach is more effective than point-to-point integration, because the integration logic is decoupled from the applications themselves and is managed in a container instead.

What is service orchestration?

Service orchestration works in a similar way to application orchestration, in that it allows you to coordinate and manage systems across multiple cloud vendors and domains—which is essential in today’s world. The approach covers microservice orchestration, network orchestration and workflow orchestration.

Individual services don’t have the native capacity to integrate with one another, and they all have their own dependencies and demands. The more complex the system, the more important it is to orchestrate the various components. That way, you can scale infrastructures as needed, optimize systems for business objectives and avoid service delivery failures.

What exactly is container orchestration?

You may have come across the term “container orchestration” in the context of application and service orchestration. So, what is container orchestration and why should we use it?

Container orchestration is the automation of container management and coordination. Software teams use the best container orchestration tools to control and automate tasks such as provisioning and deployments of containers, allocation of resources between containers, health monitoring of containers, and securing interactions between containers.

How does container orchestration work?

Software orchestration teams typically use container orchestration tools like Kubernetes and Docker Swarm. You start by describing your app’s configuration in a file, which tells the tool where to gather container images and how to network between containers.

The tool also schedules deployment of containers into clusters and finds the most appropriate host based on pre-set constraints such as labels or metadata. It then manages the container’s lifecycle based on the specifications laid out in the file.

But why do we need container orchestration? And what is the purpose of automation and orchestration? Well, automating container orchestration enables you to scale applications with a single command, quickly create new containerized applications to handle growing traffic, and simplify the installation process. It also improves security.

What is cloud orchestration?

Cloud orchestration is the process of automating the tasks that manage connections on private and public clouds. It also integrates automated tasks and processes into a workflow to help you perform specific business functions.

The rise of cloud computing, involving public, private and hybrid clouds, has led to increasing complexity. This creates a need for cloud orchestration software that can manage and deploy multiple dependencies across multiple clouds. Cloud service orchestration includes tasks such as provisioning server workloads and storage capacity and orchestrating services, workloads and resources.

Remember that cloud orchestration and automation are different things: Cloud orchestration focuses on the entirety of IT processes, while automation focuses on an individual piece. Orchestration simplifies automation across a multi-cloud environment, while ensuring that policies and security protocols are maintained.

What is security orchestration?

Security orchestration ensures your automated security tools can work together effectively, and streamlines the way they’re used by security teams. The aim is that the tools can communicate with each other and share data—thus reducing the potential for human error, allowing teams to respond better to threats, and saving time and cost.

What is Security Orchestration Automation and Response (SOAR)? The acronym describes three software capabilities as defined by Gartner:

Orchestration—Threat and vulnerability management
Automation—Security operations automation
Response—Security incident response

This approach combines automation and orchestration, and allows organizations to automate threat-hunting, the collection of threat intelligence and incident responses to lower-level threats.

What is an orchestration layer?

An orchestration layer is required if you need to coordinate multiple API services. It enables you to create connections or instructions between your connector and those of third-party applications. That effectively creates a single API that makes multiple calls to multiple different services to respond to a single API request.

It also manages data formatting between separate services, where requests and responses need to be split, merged or routed. By adding this abstraction layer, you provide your API with a level of intelligence for communication between services. An orchestration layer assists with data transformation, server management, handling authentications and integrating legacy systems.

In the cloud, an orchestration layer manages interactions and interconnections between cloud-based and on-premises components. These include servers, networking, virtual machines, security and storage.

What is journey orchestration?

What is customer journey orchestration? Journey orchestration takes the concept of customer journey mapping a stage further. It uses automation to personalize journeys in real time, rather than relying on historical data. The goal remains to create and shape the ideal customer journey.

Journey orchestration also enables businesses to be agile, adapting to changes and spotting potential problems before they happen.

Orchestration tools

The orchestration needed for complex tasks requires heavy lifting from data teams and specialized tools to develop, manage, monitor, and reliably run such pipelines. These tools are typically separate from the actual data or machine learning tasks. This lack of integration leads to fragmentation of efforts across the enterprise and users having to switch contexts a lot.

As companies undertake more business intelligence (BI) and artificial intelligence (AI) initiatives, the need for simple, scalable and reliable orchestration tools has increased. A variety of tools exist to help teams unlock the full benefit of orchestration with a framework through which they can automate workloads.

Service orchestration tools help you integrate different applications and systems, while cloud orchestration tools bring together multiple cloud systems. Orchestration tools also help you manage end-to-end processes from a single location and simplify process creation to create workflows that were otherwise unachievable.

For example, Databricks helps you unify your data warehousing and AI use cases on a single platform. Click here to learn how to orchestrate Databricks workloads.

What is application release orchestration?

Application release orchestration (ARO) enables DevOps teams to automate application deployments, manage continuous integration and continuous delivery pipelines, and orchestrate release workflows. The aim is to improve the quality, velocity and governance of your new releases.

As well as deployment automation and pipeline management, application release orchestration tools enable enterprises to scale release activities across multiple diverse teams, technologies, methodologies and pipelines. Some well-known ARO tools include GitLab, Microsoft Azure Pipelines, and FlexDeploy.

What is process orchestration?

While automated processes are necessary for effective orchestration, the risk is that using different tools for each individual task (and sourcing them from multiple vendors) can lead to silos. You need to integrate your tools and workflows, and that’s what is meant by process orchestration.

Process orchestration involves unifying individual tasks into end-to-end processes and streamlining system integrations with universal connectors, direct integrations, or API adapters. IT teams can then manage the entire process lifecycle from a single location.

Benefits include reducing complexity by coordinating and consolidating disparate tools, improving mean time to resolution (MTTR) by centralizing the monitoring and logging of processes, and integrating new tools and technologies with a single orchestration platform. This type of software orchestration makes it possible to rapidly integrate virtually any tool or technology.

The purpose of data orchestration platforms

Data orchestration is an automated process for taking siloed data from multiple storage locations, combining and organizing it, and making it available for analysis. The process connects all your data centers, whether they’re legacy systems, cloud-based tools or data lakes. The data is transformed into a standard format, so it’s easier to understand and use in decision-making.

Most companies accumulate a crazy amount of data, which is why automated tools are necessary to organize it. What is big data orchestration? It’s the process of organizing data that’s too large, fast or complex to handle with traditional methods. Data orchestration also identifies “dark data”, which is information that takes up space on a server but is never used.

Data orchestration platforms are ideal for ensuring compliance and spotting problems. For example, a payment orchestration platform gives you access to customer data in real-time, so you can see any risky transactions.

What is orchestration in DevOps?

DevOps orchestration is the coordination of your entire company’s DevOps practices and the automation tools you use to complete them. The aim is to minimize production issues and reduce the time it takes to get new releases to market.

Orchestrating your automated tasks helps maximize the potential of your automation tools. This brings us back to the orchestration vs automation question: Basically, you can maximize efficiency by automating numerous functions to run at the same time, but orchestration is needed to ensure those functions work together.

For example, DevOps orchestration for a cloud-based deployment pipeline enables you to combine development, QA and production.

What is Docker orchestration?

Docker is a user-friendly container runtime that provides a set of tools for developing containerized applications. It allows you to package your code into an image, which is then used to create a container. Therefore, Docker orchestration is a set of practices and technologies for managing Docker containers.

This type of container orchestration is necessary when your containerized applications scale to a large number of containers. It’s used for tasks like provisioning containers, scaling up and down, managing networking and load balancing.

The Docker ecosystem offers several tools for orchestration, such as Swarm. Kubernetes is commonly used to orchestrate Docker containers, while cloud container platforms also provide basic orchestration capabilities.

Orchestrating data and machine learning pipelines in Databricks

Databricks makes it easy to orchestrate multiple tasks in order to easily build data and machine learning workflows.

Orchestrating multi-step tasks makes it simple to define data and ML pipelines using interdependent, modular tasks consisting of notebooks, Python scripts, and JARs. Data teams can easily create and manage multi-step pipelines that transform and refine data, and train machine learning algorithms, all within the familiar workspace of Databricks, saving teams immense time, effort, and context switches.

In the example above, a Job consisting of multiple tasks uses two tasks to ingest data: Clicks_Ingest and Orders_Ingest. This ingested data is then aggregated together and filtered in the “Match” task, from which new machine learning features are generated (Build_Features), persistent (Persist_Features), and used to train new models (Train).

Jobs orchestration is fully integrated in Databricks and requires no additional infrastructure or DevOps resources. Customers can use the Jobs API or UI to create and manage jobs and features, such as email alerts for monitoring.

Your data team does not have to learn new skills to benefit from this feature. This feature also enables you to orchestrate anything that has an API outside of Databricks and across all clouds, e.g. pull data from CRMs. Get started today with the new Jobs orchestration now by enabling it yourself for your workspace (AWS | Azure | GCP).

Additional Resources

Back to Glossary