Leverage Unused Compute Capacity for Data + AI With Azure Spot Instances and Azure Databricks
May 25, 2021 in Partners
Azure Databricks support for Microsoft Azure Spot Virtual Machines (Spot VMs) is now generally available. Together, Spot VMs and Azure Databricks help innovative customers like aluminum and energy producer Hydro accelerate data + AI workloads while optimizing costs. By using Spot VMs as workers for Azure Databricks clusters, you can save up to 90%* on compute costs for your workloads with minimum impact on your workload completion time. Take advantage of unused compute capacity for warm instance pools to reduce times for Azure Databricks cluster start and auto-scaling. Need to speed up job completion? Consider using more powerful instances or larger clusters with Azure Spot VMs.
What are Azure Spot VMs?
With Azure Spot VMs, and spot pricing, access unused capacity at deep discounts (note that these discounts vary based by Azure region, VM type, and Azure available capacity when the workload is deployed). You pay up to the maximum price that you optionally specify in advance.
Azure Databricks automatically handles the termination of Spot VMs by starting new pay-as-you-go worker nodes to guarantee your jobs will eventually complete. This provides predictability while helping to lower costs.
At any point in time when Azure needs the capacity, the Azure infrastructure will evict Azure Spot VMs. Azure’s eviction policy makes Spot VMs well suited for Azure Databricks, whose clusters are resilient to interruptions for a variety of data and AI use cases, such as ingestion, ETL, stream processing, AI models, batch scoring and more.
Spot VMs pricing and availability varies based on size, region, time of day, and more. When deploying Spot VMs, Azure will allocate the VMs if there is spare capacity available, but there is no SLA. When Azure needs the capacity back, Azure Spot VMs will be evicted with 30-seconds notice.
Spot VMs are different from the traditional pay-as-you-go instances in which Azure guarantees the availability of instances but dictates the hourly price. While Spot VMs pricing varies, pay-as-you-go pricing rarely changes.
View historical pricing for Azure Spot VMs
You can easily view the price history and the eviction rate for Spot VMs. (Please note that Spot VMs pricing does not include network, storage or other resources, which are billed separately.) To see historical Spot VMs pricing and eviction rates, navigate to the Create a virtual machine page within the Azure Portal and click “View pricing history and compare pricing in nearby regions, which presents the historical pricing and eviction rates for the selected regions and instances.
There are several ways to use Spot VMs with Azure Databricks. Let’s take a look at how you can leverage them.
Create an Azure Databricks cluster with Spot VMs using the UI
When you create an Azure Databricks cluster, select your desired instance type, Databricks Runtime version and then select the “All Spot” option from the On-demand/Spot option.
Create an Azure Databricks cluster with Spot VMs using the REST API
With the Azure Databricks Clusters REST API, you have the ability to choose your maximum Spot price and fallback option if Spot instances unavailable or above your max price. Create a bearer token in the Databricks UI, which will be used to authenticate when making your API call.
Create an Azure Databricks warm pool with Spot VMs using the UI
You can use Azure Spot VMs to configure warm pools. Clusters in the pool will launch with spot instances for all nodes, driver and worker nodes. When creating a pool, select the desired instance size and Databricks Runtime version, then choose “All Spot” from the On-demand/Spot option.
If spot instances are evicted due to unavailability, on-demand instances are deployed to replace evicted instances.
Create a warm pool with Spot VMs using the Instance Pools API
The Instance Pools API can be used to create warm Azure Databricks pools with Spot VMs. In addition to the options available in the Azure Databricks UI, the Instance Pools API enables you to specify a maximum Spot VMs price and fallback behavior if Spot VMs capacity is unavailable.
Learn more about using Azure Spot VMs with Azure Databricks by viewing the Azure Spot VMs documentation, Azure Databricks Clusters API documentation, Azure Databricks pools documentation and Instance Pools API documentation. To get started with Azure Databricks, visit databricks.com/azure and attend upcoming Azure Databricks events.