Skip to main content

What is SparkR?

Run R programs at scale using Apache Spark's distributed computing engine with familiar R syntax

by Databricks Staff

  • SparkR lets R users run distributed data processing on Apache Spark using familiar R syntax so they can scale analysis beyond what fits in local memory.
  • SparkR follows the same principles as other Spark language APIs, exposing core capabilities through an R package that can be imported directly into existing workflows.
  • Most features available to Python users are also available in SparkR, making it straightforward for R data scientists to work with big data on Databricks clusters.

SparkR is a tool for running R on Spark. It follows the same principles as all of Spark’s other language bindings. To use SparkR, we simply import it into our environment and run our code. It’s all very similar to the Python API except that it follows R’s syntax instead of Python. For the most part, almost everything available in Python is available in SparkR.
 

Additional Resources

REPORT

The agentic AI playbook for the enterprise

Get the latest posts in your inbox

Subscribe to our blog and get the latest posts delivered to your inbox.