Skip to main content

Apache® Spark™ Growth Is Pervasive as Users Embrace Public Cloud, Machine Learning, and Streaming Capabilities

Databricks-Conducted Survey Reveals Spark Is Increasingly Being Used in Production

September 27, 2016
Share this post

SAN FRANCISCO, CA--(Marketwired - Sep 27, 2016) - Databricks, the company founded by the creators of the Apache Spark project, today released the findings of their second annual Apache Spark survey to determine how enterprises and users are utilizing the data analytics and processing engine. The 2016 Databricks Apache Spark Survey collected more than 1,600 responses from 900 organizations. The results show a rise in deployments of Spark in the public cloud, an increased usage of Spark in industry verticals, and an uptick in Spark streaming and Machine Learning. The survey also reveals that most developers employ two or more Spark components simultaneously to build increasingly sophisticated solutions.

Spark remains the most active open-source project in the big data space today, with over 1,000 contributors from more than 250 organizations. Spark's adoption continued to accelerate throughout the past year, and its growth continues across various industries, building sophisticated data solutions by people in various functional roles. In fact, Spark has moved well beyond the early-adopter phase and is now considered mainstream in large data-driven enterprises, such as banking, medical, bio-tech, and pharmacy.

Download the full report here:

"Since inception, Spark's core mission has been to make big data simple and accessible for everyone -- for organizations of all sizes and across all industries. And we have not deviated from that mission," said Matei Zaharia, creator of Apache Spark and Databricks' Chief Technologist. "I'm excited to see more Apache Spark deployments in the cloud and interest from users to build real-time applications using Spark Streaming, machine learning libraries, and other components, tackling complex problems across a broad range of industries."

Key findings from the survey include:

  • Spark adoption and community growth accelerates: Spark Meetup membership tripled since last year, from 66,000 to 225,000 members. The number of diverse companies represented at Spark Summit grew from 1,144 to 1,888, and number of code contributors from 600 to 1,000. This suggests a thriving and growing Spark community.
    Spark's deployment in the public cloud rises: The survey confirms the rise of cloud computing across industries. Spark deployments in the public cloud jumped from 51 percent (in 2015) to 61 percent (in 2016), whereas the percentage of Spark deployments dropped for on-premises cluster managers: standalone (48 percent in 2015 to 42 percent in 2016), YARN (40 percent to 36 percent) and Mesos (11 percent in 2015 to 7 percent in 2016).
  • Spark's Streaming and Machine Learning usage surge: As investments in fast data analytics surge, more than half (i.e. 51 percent) of respondents indicate Spark streaming as an essential component for building real-time streaming and analytical solutions. Compared to 2015, the production use of Spark Streaming grew by 57 percent, so did MLlib by 38 percent.
    Spark's usage increases in production: Spark's community uses many components in production for building sophisticated applications. Spark DataFrames saw the largest increase in production usage, from 15 percent to 38 percent, aside from Spark streaming and machine learning mentioned above. Also, Spark SQL rose from 24 percent to 40 percent.
  • Spark is attracting diverse users across big data analytics: Spark's adoption is growing across professional roles due to its ease of use and accessibility of common programming languages. Among languages used that Spark supports, R saw an increase in use from 18 percent in 2015 to 20 percent in 2016, as did SQL from 36 percent in 2015 to 40 percent in 2016, suggesting new users who are not only data engineers but data analysts. Also, Windows users of Spark increased from 23 percent in 2015 to 32 percent in 2016.

"As Spark becomes easier, faster, and smarter outside the Web Industry, a newer audience is adopting it, as results from the survey suggest," said Reynold Xin, chief architect and co-founder at Databricks. "Performance, ease-of-use, streaming, and reliability top the list as the most important features. These attributes make Spark an attractive engine for performing advanced analytics across industry verticals in solving complex data problems, by users from different functional roles."

About the survey:
A total of 1,615 respondents from 900 distinct organizations responded to this survey. Of the roles represented in the survey, 41 percent identified themselves as data engineers, while 23 percent as data scientists and 21 percent as architects; the rest came from technical management and academia. Survey respondents were predominantly Apache Spark users.

Additional resources:

About Databricks

Databricks' vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache® Spark™, a powerful open source data processing engine built for sophisticated analytics, ease of use, and speed. Databricks is the largest contributor to the open source Apache Spark project providing 10x more code than any other company. The company has also trained over 20,000 users on Apache Spark, and has the largest number of customers deploying Spark to date. Databricks provides a just-in-time data platform, to simplify data integration, real-time experimentation, and robust deployment of production applications. Databricks is venture-backed by Andreessen Horowitz and NEA. For more information, contact [email protected].

Recent Press Releases

Databricks Strengthens Presence in Korea with Senior Leadership Hires
Read Now
Introducing Databricks LakeFlow: A Unified, Intelligent Solution for Data Engineering
Read Now
Databricks Open Sources Unity Catalog, Creating the Industry's Only Universal Catalog for Data and AI
Read Now
Introducing Databricks AI/BI: Intelligent Analytics for Real-World Data
Read Now
Databricks Unveils New Mosaic AI Capabilities to Help Customers Build Production-Quality AI Systems and Applications
Read Now
View All