Skip to main content
Company Blog

Apache Spark 1.4 was released on June 11 and one of the exciting new features was SparkR. I am happy to announce that we now support R notebooks and SparkR in Databricks, our hosted Spark service. Databricks lets you easily use SparkR in an interactive notebook environment or standalone jobs.

R and Spark nicely complement each other for several important use cases in statistics and data science. Databricks R Notebooks include the SparkR package by default so that data scientists can effortlessly benefit from the power of Apache Spark in their R analyses. In addition to SparkR, any R package can be easily installed into the notebook. In this blog post, I will highlight a few of the features in our R Notebooks.

Getting Started with SparkR

Screen Shot 2015-07-10 at 1.16.56 PM

To get started with R in Databricks, simply choose R as the language when creating a notebook.  Since SparkR is a recent addition to Spark, remember to attach the R notebook to any cluster running Spark version 1.4 or later. The SparkR package is imported and configured by default. You can run Spark queries in R:

Using SparkR you can access and manipulate very large data sets (e.g., terabytes of data) from distributed storage (e.g., Amazon S3) or data warehouses (e.g., Hive).

airlinesDF 

SparkR offers distributed DataFrames that are syntax compatible with R data frames. You can also collect a SparkR DataFrame to local data frames.
Try Databricks for free

Related posts

Company blog

Introducing R Notebooks in Databricks

July 13, 2015 by Hossein Falaki in Company Blog
Apache Spark 1.4 was released on June 11 and one of the exciting new features was SparkR. I am happy to announce that...
See all Company Blog posts