Skip to main content

Databricks and Datastax

by

in

Share this post

Today, Datastax and Databricks announced a partnership in which Apache Spark becomes an integral part of the Datastax offering, tightly integrated with Cassandra. We’re very excited to be embarking on this journey with Datastax for a multitude of reasons:

Integrating operational systems with analytics

One of the use cases that we’ve increasingly been asked about by Spark users is the ability to create a closed loop system: perform advanced analytics directly on operational data that is then fed back into the operational system to drive necessary adaptation. The tight integration of Cassandra and Spark will enable users to achieve this goal by leveraging Cassandra as the high-performance transactional database that powers online applications and Spark as a next generation processing engine that can deliver deeper insights, faster while seamlessly moving between the two.

Spark beyond Hadoop

The most talked about usage model for Spark to date has been within Hadoop deployments - Spark can operate directly over data in HDFS (without needing to move the data first) and natively supports YARN and Mesos, popular resource managers for Hadoop. However, Spark’s applicability is much broader: it is designed to be a general Big Data processing engine, and the Spark / Cassandra integration is a prime example of this - native processing without requiring a batch movement of data to Hadoop first (or even a Hadoop cluster). Furthermore, the recently announced SparkSQL will help optimize this integration further - not only will Spark be able to directly access data stored in Cassandra, but it will also be able to execute selected parts of the query in Cassandra itself. It can then pull the resulting data set into Spark for performing machine learning and other advanced analytics.

Innovation in the Open

This partnership also brings together two groups with very strong open source commitments and heritage. Databricks is focused on keeping Apache Spark 100% open source and Datastax has invested numerous resources in growing the Apache Cassandra community, so it should be no surprise that a key tenet of this partnership is delivering joint innovation back to the open source community to help drive greater integration between the Spark and Cassandra communities over time. Look for significant contributions as we move forward on this journey.

Please join us at the upcoming Spark Summit to hear more about the value of using Spark and Cassandra together and additional innovations on the horizon in a keynote talk by Martin Van Ryswyk, Datastax’s VP of Engineering.

Try Databricks for free

Related posts

Guest blog: Zen and the Art of Apache Spark Maintenance with Cassandra

June 16, 2015 by Russell Spitzer and Wayne Chan in
This is a guest post from our friends at DataStax. Apache Cassandra™ is a fully distributed, highly scalable database that allows users to...

Top Considerations When Migrating Off of Hadoop

July 22, 2021 by Manveer Sahota and Ron Guerrero in
Apache Hadoop was created more than 15 years ago as an open source, distributed storage and compute platform designed for large data sets...

Getting Around “Moore’s Wall”: Databricks CEO Ali Ghodsi Strives to Make AI More Accessible to the Fortune 2000

August 22, 2017 by Battery Ventures in
Today Databricks, a high-profile provider of technology fueling artificial-intelligence and data-analysis breakthroughs at big companies, announced it has raised $140 million from a...
See all Company Blog posts