Databricks and Informatica Partner to Accelerate Development of Intelligent Data Pipelines
Joint Solution Allows Data Teams Faster Development and Complete Data Governance for Data Engineering Workloads
May 21, 2019
San Francisco, CA – May 21, 2019 – Databricks, the leader in unified analytics and founded by the original creators of Apache Spark™, and Informatica, the enterprise cloud data management leader, announced a partnership to accelerate the development of intelligent data pipelines. As a result of the partnership, the companies introduced product integrations that provide rapid and efficient data ingestion, simplified creation of high-volume data pipelines, and integrated data governance for intelligent data discovery and end-to-end lineage. The partnership is being announced on the keynote stage by CEOs Ali Ghodsi and Anil Chakravarthy at Informatica World 2019, taking place now in Las Vegas.
Today data engineering and data science teams depend on many hybrid data sources that make finding the right datasets and tracing the lineage of data through pipeline processing impossible. Bringing the Informatica capabilities for discovery, lineage, ingestion and preparation together with Databricks’ Unified Analytics Platform provides an analytics solution for intelligent data pipelines that leverages the correct datasets and provides end-to-end data lineage for analytics and machine learning implementations.
The Informatica and Databricks partnership introduces product integrations that allow faster development and complete governance for data engineering workloads:
- Informatica’s Cloud Data Integration and Databricks’ Unified Analytics Platform enable data teams to quickly ingest data directly into a managed data lake from hundreds of hybrid data sources.
- Informatica’s Big Data Management with Databricks’ Unified Analytics Platform allows data teams to easily create performant, scalable data pipelines for big data. Using Informatica’s visual drag and drop workflows, data teams can define their data pipelines to run on highly optimized Apache Spark™ clusters in Databricks to provide high performance at scale.
- Informatica’s Enterprise Data Catalog provides support for tracking data lineage of pipelines with Databricks’ Unified Analytics Platform, and makes Databricks tables available as part of the data catalog.
Informatica is also announcing support for Delta Lake, the new open source project from Databricks, to provide an analytics-ready place to store massive amounts of data. Delta Lake provides ACID transactions and schema enforcement that brings reliability at scale to data lakes and makes high quality datasets ready for downstream analytics.
“This seamless integration between Databricks and Informatica enables data engineers to easily discover the right datasets and ingest high volumes of data from multiple sources into Delta Lakes,” said Ghodsi, co-founder and CEO, Databricks. “This means joint customers can use the reliability and performance at scale from Databricks to make data ready for analytics and machine learning - and get intelligent governance to find, track and audit that data from end to end.”
“Trusted, high-quality data and efficient use of data users’ time are critical success factors for analytics and data science projects,” said Chakravarthy, CEO, Informatica. “Informatica’s support for Databricks allows data engineers to rapidly build serverless pipelines to ingest and govern data from a variety of sources at scale, while empowering data scientists using Databricks to quickly find and prepare the data for their analytics and data science projects in a self-service fashion.”
To learn more about this exciting partnership and joint solutions, sign up for an Informatica and Databricks joint webinar on June 11, 2019.
Databricks’ mission is to accelerate innovation for its customers by unifying Data Science, Engineering and Business. Founded by the original creators of Apache Spark, Databricks provides a Unified Analytics Platform for data science teams to collaborate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production. The company also makes it easier for its users to focus on their data by providing a fully managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost of ownership. Databricks has secured investments from Andreessen Horowitz, Coatue Management, Microsoft, New Enterprise Associates (NEA), Battery Ventures, Green Bay Ventures, and Geodesic, among others, and has a global customer base that includes Viacom, Shell and HP.
Apache, Apache Spark and Spark are trademarks of the Apache Software Foundation.
Informatica is the only proven Enterprise Cloud Data Management leader that accelerates data-driven digital transformation. Informatica enables companies to fuel innovation, become more agile, and realize new growth opportunities, resulting in intelligent market disruptions. Over the past 25 years, Informatica has helped thousands of customers unleash the power of data. For more information, call +1 650-385-5000 (1-800-653-3871 in the U.S.), or visit www.informatica.com. Connect with Informatica on LinkedIn, Twitter, and Facebook.