Skip to main content

Databricks and IBM Collaborate to Advance Machine Learning to the Apache Spark Project

Companies Announce Key Initiatives to Drive Machine Learning Adoption

June 15, 2015
Share this post

SAN FRANCISCO, CA--(Marketwired - Jun 15, 2015) - Databricks, the company founded by the creators of the popular open-source Big Data processing engine Apache Spark, and IBM today announced a joint effort to contribute key machine learning capabilities to the Apache Spark Project. The announcement was made at Spark Summit in San Francisco, the event bringing together the growing Apache Spark and Databricks communities with leading production users of Spark, SparkSQL, Spark Streaming and related projects.

Apache Spark is an open source data processing engine built for speed, ease of use, and sophisticated analytics. Spark is designed to perform both batch processing and new workloads like streaming, interactive queries, and machine learning. Having recently won the 2014 Gray Sort competition, a third-party benchmark measuring how fast a system can sort 100TB of data (1 trillion records), Spark has become the largest open source community in big data, with over 500 contributors from more than 200 organizations. To date, there are over 500 active Spark deployments in production and it continues to grow.

This announcement further validates IBM's commitment to help democratize big data and analytics through their continued investment in the development of open source projects. Over the course of the next few months, IBM and Databricks will collaborate to expand Spark's machine learning capabilities. The companies plan to introduce new domain specific algorithms to the Spark ecosystem and add new machine learning primitives in the Apache Spark Project. IBM and Databricks will also collaborate to integrate IBM's SystemML -- a robust machine-learning engine for large-scale data, with the Spark platform.

Together, Databricks and IBM are collaborating to make it possible for data scientists and engineers to build models quickly and iterate faster with the changing needs of the business. As a result, more people can apply deep intelligence to a broad array of business applications, ranging from the Internet of Things (IoT), e-commerce, mobile, social and enterprise technology. Spark's growing community coupled with its robust library of algorithms in MLlib, simple to use APIs, in-memory compute engine, and its scalability, makes it the ideal framework for data professionals building fast and scalable, machine learning applications.

"Spark is undoubtedly a force to be reckoned with in the big data ecosystem. Collaborating with Databricks is the next logical step towards delivering next-generation applications for our customers," said Beth Smith, General Manager, Analytics Platform, IBM Analytics. "Scientists and engineers at IBM will work with Databricks and the Apache Spark community to rapidly accelerate access and breadth of machine learning capabilities and drive speed-to-innovation in the development of smart business apps."

"The size and scale of companies that are partnering with Databricks to support the Spark movement is both inspiring and validating," said Ion Stoica, CEO at Databricks. "We are looking forward to IBM becoming a key member of the Spark community, as seen by their investment in a Spark Technology Center in San Francisco. This collaboration will help Spark continue to gain mainstream adoption and deliver next-generation big data analytics and applications."

About Databricks:

Databricks' vision is to dramatically simplify big data processing. It was founded by the team that created and continues to drive Apache Spark, a powerful open source data processing engine built for sophisticated analytics, ease of use, and speed. Databricks offers a cloud platform that makes it easy to turn data into value, from ingest to production, without the hassle of managing complex infrastructure, systems and tools. Databricks is venture-backed by Andreessen Horowitz and NEA. For more information, visit

Recent Press Releases

Databricks Launches Data Intelligence Platform for Energy, Bringing Generative AI Capabilities to the Energy Sector
Read Now
Databricks Sees Over 70% Annual Growth in the ANZ Market as Enterprise AI Booms
Read Now
Databricks Launches DBRX, A New Standard for Efficient Open Source Models
Read Now
Databricks Strengthens Presence in Latin America, Appointing Marcos Grilanda as Vice President and General Manager
Read Now
Databricks Doubles Down on Investment in India Amidst Local Enterprise AI Boom
Read Now
View All



For press inquires:

[email protected]

Stay connected

Stay up to date and connect with us through our newsletter, social media channels and blog RSS feed.
Subscribe to the newsletter

Get assets

If you would like to use Databricks materials, please contact [email protected] and provide the following information:

Your name and title
Company name and location
Description of request
View brand guidelines