SAN FRANCISCO — July 1, 2014 — Databricks, the company founded by the team that started the Spark research project at UC Berkeley that later became Apache Spark – the popular open-source processing engine - today announced a new partnership with SAP (NYSE: SAP) and to deliver a Databricks-certified Apache Spark distribution offering for the SAP HANA® platform. The full production-ready distribution offering, based on Apache Spark 1.0, is deployable in the cloud or on premise and available for immediate download from SAP at no cost at spr.ly/SAP_and_Spark. The announcement was made at the Spark Summit 2014, being held June 30 – July 2 in San Francisco.
The Databricks-certified distribution offering for SAP HANA contains the Spark processing engine that works with any Hadoop distribution out of the box, providing a more complete data store and processing layer for Hadoop. Certified by Databricks to be compatible with the Apache Spark Distribution, this enables the rapidly growing set of “Certified on Spark” applications to run out of the box and on SAP HANA. This production-ready distribution offering is the first result of Databrick’s new partnership with SAP.
“We’re thrilled to be embarking on this journey with SAP to bring together two powerful technologies to better enable enterprises to derive value from their data,” said Ion Stoica, CEO of Databricks. “SAP HANA is both an incredibly powerful and fast analytics engine, as well as a repository for some of the most valuable enterprise data by virtue of the enterprise applications that it helps run. This integration will help enable the large and growing community of Hadoop and Spark developers and applications to harness these capabilities immediately via Spark as well as extend the reach of SAP HANA.”
SAP HANA integrated with Spark will help enable real-time applications and interactive analysis across corporate application data with content stored in Hadoop Distributed File System (HDFS). Developers and data scientists developing on Spark can also benefit from end-to-end data processing acceleration in SAP HANA by leveraging its comprehensive suite of in-memory engines and libraries for transactional applications, analytics, predictive, machine learning, text, graph and geospatial analysis. This helps simplify the integration of mission-critical applications with contextual data stored in Hadoop-like data stores. As a result, in-memory computation is enabled to happen where data resides and can help minimize costly and time-consuming data movement.
“SAP has continually been at the forefront of innovation to simplify and better serve customers, and bringing together Spark and SAP HANA is simply the latest example of this,” said Steve Lucas, president, Platform Solutions, SAP. “This can allow enterprises to build on SAP HANA’s value proposition by providing some of the best-of-breed capabilities across the full spectrum of data and processing needs without the need to painstakingly stitch together independent solutions.”
Developers and data scientists will be enabled to more easily create a new class of applications with SAP HANA and Spark. For example, they can span data domains, such as applications that integrate inventory analysis with social media trends for retailers; combine sensor data with billing systems to deliver personalized resource and cost-saving recommendations for utilities; or converge patient data with epidemiological information to construct better staffing decisions for healthcare providers.