How to Migrate Your Data and AI Workloads to Databricks With the AWS Migration Acceleration Program

Published: August 18, 2022

In this blog we define the process for earning AWS customer credits when migrating Data and AI workloads to Databricks on Amazon Web Services (AWS) with the AWS Migration Acceleration Program (MAP). We will show you how to use AWS MAP tagging to identify new migrated workloads such as Hadoop and Enterprise Data Warehouses (EDW), in order to ensure workloads qualify for valuable AWS customer credits. This information is helpful for customers, technical professionals at technology and consulting partners, as well as AWS Migration Specialists and Solution Architects.

Databricks overview

Databricks is the data and AI company. More than 7,000 organizations worldwide — including Comcast, Condé Nast, H&M and over 40% of the Fortune 500 — rely on the Databricks Lakehouse Platform to unify their data, analytics and AI. Founded by the original creators of Apache Spark™, Delta Lake and MLflow, Databricks is on a mission to help data teams solve the world’s toughest problems. Databricks is recognized by Gartner as a Leader in both Cloud Database Management Systems and Data Science and Machine Learning Platforms.

The Databricks Lakehouse on AWS unifies the best of data warehouses and data lakes in one simple platform to handle all your data, analytics and AI use cases. It’s built on an open and reliable data foundation that efficiently handles all data types and applies one common security and governance approach across all of your data and cloud platforms.

What is the AWS Migration Acceleration Program (MAP)?

The AWS Migration Acceleration Program (MAP) is a comprehensive and proven cloud migration program based upon AWS’s experience migrating thousands of enterprise customers to the cloud. Enterprise migrations can be complex and time-consuming, but MAP can help you accelerate your cloud migration and modernization journey with an outcome-driven methodology.

MAP provides tools that reduce costs and automate and accelerate execution through tailored training approaches and content, expertise from AWS Professional Services, a global partner network, and AWS investment. MAP also uses a proven three-phased framework (Assess, Mobilize, and Migrate and Modernize) to help you achieve your migration goals. Through MAP, you can build strong AWS cloud foundations, accelerate and reduce risk, and offset the initial cost of migrations. Leverage the performance, security, and reliability of the cloud.

Why do you need to tag resources?

Migrated resources must be identified with a specific map-migrated tag (tag key is case sensitive) to ensure AWS credits are provided to customers as an incentive and to reduce the cost of migrations. The tagging process explained below should be used for Hadoop, Data Warehouse, on-premises, or other cloud workload migrations to AWS.

Steps to Tag Migrated Resources

The following infographic provides an overview of the seven-step process:

Implement AWS MAP tagging in Databricks on AWS

Set up an AWS Organization account

Set up a Databricks Workspace

Set up your Databricks workspace via Cloud Formation or the Databricks account console in less than 15 minutes.

Set up a Databricks Workspace

Activate AWS MAP Tagging

Provide the Migration Program Engagement ID (MPE ID is received after signing an AWS MAP Agreement with your AWS representatives) on the CloudFormation stack to be used to create the dependent AWS objects. This will create Cost and Usage Reports (CUR) and generate a server ID to be used by the AWS Migration Hub for migrations.

AWS CloudFormation template for generating server IDs and setting up Cost and usage reports

AWS CloudFormation template

Providing the MPE ID before initiating the AWS CloudFormation Stack for MAP

After the AWS CloudFormation is run successfully, copy the migration hub server IDs generated from the output and tag them as a value to the map-migrated tag set on the Databricks clusters used as the target clusters for migration. In addition to Databricks clusters, follow the same tagging mechanism across other AWS resources used for the migration, including the Amazon S3 buckets and Amazon Elastic Block Store (EBS) volumes.

Copying the server IDs from the AWS CloudFormation output to be used in MAP tagging

Databricks clusters being used for migration

Spin up the Databricks clusters for migration and tag them with map-migrated tags one of three ways: 1. the Databricks console, 2. the AWS console, or 3. the Databricks’ API and its cluster policies.

1. MAP tagging Databricks clusters using the Databricks console (preferred)

MAP tagging Databricks clusters using the Databricks console

Amazon EBS volumes are automatically MAP tagged when tagging is done via the Databricks console

db-309-blog-img-10
db-309-blog-img-11

2. MAP tagging Databricks clusters via the AWS console

MAP tagging Databricks clusters via the AWS console

3. Databricks cluster tagging can be performed via cluster policies

Be sure to tag the associated Amazon S3 buckets

bucket tagging

Once all Databricks on AWS resources are tagged appropriately, perform the migration and track the usage via AWS Cost Explorer. Organizations who have signed an AWS MAP Agreement and performed all the required steps will see credits applied to their AWS account. Remember to activate the MAP tags in the Cost Allocation Tags section of the AWS Billing Console. The map-migrated tags may take up to 24 hours to show up in the Cost Allocation Tags section after you have deployed the CloudFormation template.

db-309-blog-img-14

Activating Cost Allocation Tags

Automatically Delivered Cost and Usage Reports

Services > Billing > Cost & Usage Reports.

AWS Cost and Usage Reports

Summary

In this blog we explained how to successfully tag migrated workloads to Databricks on AWS using the AWS Migration Acceleration Program (MAP). Using tags to identify migrated workloads will benefit customers through AWS credits. The steps involved include generating server IDs on the AWS Migration Hub, setting up cost allocation tags, using MAP tags to target Databricks clusters, automatically delivering cost and usage reports, and tracking usage via Cost Explorer.

Questions? Email us at [email protected].

Additional Resources

AWS Migration Acceleration Program (MAP)

AWS Migration Acceleration Program
AWS Migration Acceleration Program Tagging Instructions Guide (Note: Refer to this guide for the latest CloudFormation template.)

Hadoop Migrations

SAS Migrations

Data Warehouse Migrations

What's next?

December 11, 2024/15 min read

Introducing Databricks Generative AI Partner Accelerators and RAG Proof of Concepts

January 2, 2025/6 min read

Databricks overview

What is the AWS Migration Acceleration Program (MAP)?

Why do you need to tag resources?

Steps to Tag Migrated Resources

Set up an AWS Organization account

Set up a Databricks Workspace

Activate AWS MAP Tagging

AWS CloudFormation template for generating server IDs and setting up Cost and usage reports

Providing the MPE ID before initiating the AWS CloudFormation Stack for MAP

Copying the server IDs from the AWS CloudFormation output to be used in MAP tagging

Databricks clusters being used for migration

1. MAP tagging Databricks clusters using the Databricks console (preferred)

Amazon EBS volumes are automatically MAP tagged when tagging is done via the Databricks console

2. MAP tagging Databricks clusters via the AWS console

3. Databricks cluster tagging can be performed via cluster policies

Activating Cost Allocation Tags

Automatically Delivered Cost and Usage Reports

Never miss a Databricks post

Sign up

What's next?

Introducing Databricks Generative AI Partner Accelerators and RAG Proof of Concepts

How HP is optimizing the 3D Printing supply chain using Delta Sharing