Skip to main content

What is a Hosted Spark?

Unified data platform with REST API access to Spark clusters for remote applications, enabling interactive and batch data exploration in multiple languages

4 Personas Analytics AIBI 6

Summary

  • Supports interactive Scala, Python, R shells and batch submissions in Scala, Java, Python through REST APIs, allowing multiple users to share servers and submit jobs from anywhere without code changes
  • Facilitates turnkey interaction between Spark and application servers, streamlining architecture required by interactive web and mobile apps through services connecting remote applications efficiently to Spark clusters
  • Provides high-level APIs across languages with optimized engine supporting general computation graphs, plus Spark SQL, MLlib machine learning, GraphX graph processing, and Spark Streaming capabilities for comprehensive data analysis

What is Hosted Spark?

Apache Spark is a fast and general cluster computing system for Big Data built around speed, ease of use, and advanced analytics that was originally built in 2009 at UC Berkeley. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. In addition, it also supports several other tools such as Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing.

Spark Provides Two Modes for Data Exploration:

  • Interactive
  • Batch

Spark Exploration Modes For a simplified end-user interaction, Spark is also provided to organizations in a unified hosted data platform. In the absence of direct access to Spark resources by remote applications, the user had to face a longer route to production. In order to overcome this obstacle, there have been created services that enable remote apps to efficiently connect to a Spark cluster over a REST API from anywhere. These interfaces support the execution of snippets of code or programs in a Spark context that runs locally or in Apache Hadoop YARN. Hosted Spark interfaces proved to be turnkey solutions as they facilitate the interaction between Spark and application servers, streamlining the architecture required by interactive web and mobile apps.

A 5X LEADER

Gartner®: Databricks Cloud Database Leader

Hosted Spark Services Provide These Features:

  • Interactive Scala, Python, and R coverings
  • Batch submissions in Scala, Java, Python
  • Multiple users are able to share the same server
  • Allows users to submit jobs from anywhere through REST
  • No code change is required do be done to your programs

Organizations can now easily overcome the existing bottlenecks that impede their ability to operationalize Spark, and instead, focus on capturing the value promised by big data.  

Additional Resources

Never miss a Databricks post

Subscribe to our blog and get the latest posts delivered to your inbox