Skip to main content

Databricks + Arcion: Real-time enterprise data replication to the Lakehouse

Share this post

We are excited to announce that we have completed our acquisition of Arcion, a leading provider for real-time data replication technologies.  

Arcion’s capabilities will enable Databricks to provide native solutions to replicate and ingest data from various databases and SaaS applications, enabling customers to focus on the real work of creating value and AI-driven insights from their data. We have worked closely with the team at Arcion for a number of years not only as a Databricks partner, but also as a Databricks Ventures portfolio company. With this announcement, we officially welcome the team to the Databricks family.

Real-time data ingestion and database replication

Our mission at Databricks is to democratize data and AI for every organization. To deliver on our mission, we built the Databricks Lakehouse Platform to offer a unified, open, and scalable platform for all your data, analytics, and AI. More than 10,000 organizations worldwide rely on the Lakehouse and have achieved best-in-class price/performance, together with unified governance, security and AI capabilities. 

However, the platforms are only as valuable as the data in them. Before organizations can fully reap the benefits of the lakehouse, they must ingest, replicate, or migrate data from different source databases and applications. Data movement from different sources requires specialized knowledge of each source system, such as the nuances of unique SQL dialects, ingestion strategies, binary log protocols and security challenges. Not only does these present significant friction in pipeline development, but they also create high operational overhead through brittle pipelines and complex, error-prone processes often manifests as frustrating delays in deriving value from data and higher TCO. 

Arcion will enable Databricks to natively provide a scalable, easy-to-use, and cost-effective solution to ingest real-time and on-demand data from various enterprise data sources. Arcion’s no-code, zero-maintenance Change Data Capture (CDC) pipeline architecture enables downstream analytics, streaming, and AI use cases through native connectors to over 20 enterprise database systems, such as Oracle, SQL Server, Teradata, and Snowflake, as well as SaaS applications such as Salesforce, SAP, and Workday. Each of these connectors provides automatic schema conversion and is adapted to the particular nuances of the source system. This minimizes the operational burden on customers' infrastructure and enables teams to deploy production-grade pipelines in minutes. Finally, Arcion further reduces DevOps overhead with built-in autoscaling, high availability, and live monitoring.


native connectors
Figure 1.  Native connectors

A world-class team

Arcion was founded by database technologist & current CTO Rajkumar Sen. He was later joined by CEO Gary Hagmueller, a veteran in data and AI technologies. Raj’s vision for making log-based CDC simple and performant transformed Arcion into an industry-leading solution with the help of a team that brings over 140 combined years of experience in the data replication space. Arcion’s team of experts will be a great asset in helping accelerate our customers’ journey to the Lakehouse, and we are excited to be welcoming Raj and team to Databricks.

What’s next

We want to make it easy and fast for our customers to tap into relevant data sources in their enterprise. Earlier this year, we announced Lakehouse Federation to allow organizations to build a highly scalable and performant data mesh architecture with unified governance. Lakehouse Federation makes it simple for organizations to expose, query, and govern siloed data - no matter where it lives - as an extension of their lakehouse.

In the era of generative AI, it’s even more true that data is every company’s most valuable asset. For most customers, the vast amount of data locked inside legacy databases, data warehouses, and SaaS applications has tremendous potential to give them a competitive edge. 

With the integration of Databricks and Arcion’s data replication capabilities, we will further accelerate the promise of the Databricks Lakehouse Platform for our customers across industries to rapidly make decades of data available for both traditional analytics as well as generative AI applications. Look out in the coming months for announcements of many Arcion-powered capabilities that would dramatically simplify data replication and ingestion.