R is a widely used statistical programming language but its interactive use is typically limited to a single machine. To enable large scale data analysis from R, we will present SparkR, an open source R package developed at UC Berkeley, that allows data scientists to analyze large data sets and interactively run jobs on them from the R shell. This talk will introduce SparkR, discuss some of its features and highlight the power of combining R’s interactive console and extension packages with Spark’s distributed run-time.
Shivaram Venkataraman is currently a post-doctoral researcher at Microsoft Research, Redmond and starting in Fall 2018, an assistant professor in Computer Science at the University of Wisconsin, Madison. He received his PhD at the University of California, Berkeley, where he was advised by Mike Franklin and Ion Stoica. His work spans distributed systems, operating systems and machine learning, and his recent research has looked at designing systems and algorithms for large scale data analysis.