Sparklyr is an open-source package that provides an interface between R and Apache Spark. You can now leverage Spark’s capabilities in a modern R environment, due to Spark’s ability to interact with distributed data with little latency. Sparklyr is an effective tool for interfacing with large datasets in an interactive environment. This way you can benefit from the familiar tools in R in order to analyze data in Spark., giving you the best of both worlds. Through Sparklyr you can use Spark as the backend for dplyr, a popular data manipulation package. Sparklyr provides a range of functions that allow us to access the Spark tools for transforming/pre-processing data, On top of that, it also provides interfaces to Spark’s distributed machine learning algorithms and much more. Sparklyr is also extensible. R packages that depend on Sparklyr to call the full Spark API can be created. One such extension is H2O’s Rsparkling, an R package compatible with H2O’s machine learning algorithm.