Skip to main content

Back in May, we announced our partnership with Informatica to build out a rich set of integrations between our two platforms.

 

It’s been exciting work for the team because of what we can do for joint customers that combine our Managed Delta Lake with Informatica’s Big Data Management and Enterprise Data Catalog.  The vision led us to use the term “Intelligent Data Pipelines” that we outlined in our first blog post.  Customers can have a solution that enables data engineers to quickly ingest high volumes of data from multiple hybrid sources into the cloud, stream that into an optimized data lake, and ensure that data is properly governed, making it accurate and ready for downstream analytics and ML.

 

Migrating Big Data Workloads from On-premises Hadoop to the Cloud

Most recently, we focused specifically on organizations looking to migrate their big data workloads from on premises Hadoop to the cloud. Those data teams still spend a lot of time on data preparation and ingestion vs. the higher-value advanced analytics and machine learning. Core Hadoop services such as YARN and HDFS are complex to manage that results in high TCO.  Users have to manually configure and optimize clusters for scale-up and scale-down, which is time consuming and directly impacts the reliability and performance of Hadoop-based data lakes.

 

Key Questions Concerning a Hadoop to Cloud Migration

 

Does migrating from Hadoop to the cloud release the operational burden of managing shared clusters? How do you manage compute and storage when migrating to the cloud? What are the key benefits of migrating to a cloud-native platform like Databricks? How does Databricks compare to YARN and HDFS?


Those questions are the exact topic of this blog co-authored by Informatica and Databricks. It is a detailed review of the architecture changes in migrating from Hadoop to Databricks, and for added measure it covers best practices of Hadoop migration to fully leverage the Databricks and Informatica data engineering integration.  Check it out!

Try Databricks for free

Related posts

It’s Time to Re-evaluate Your Relationship With Hadoop

With companies forced to adapt to a remote, distributed workforce this past year, cloud adoption has accelerated at an unprecedented pace by +14%...

Migration from Hadoop to Modern Cloud Platforms: The Case for Hadoop Alternatives

November 27, 2019 by Anand Venugopal and James Nguyen in
Companies rely on their big data and analytics platforms to support innovation and digital transformation strategies. However, many Hadoop users struggle with complexity...
See all Partners posts