What Lies Beneath Apache Spark’s RDD API (Using Spark-shell and WebUI)

Download Slides

The talk is aimed at introducing Spark from near-low-level details of RDD and jobs that are triggered by actions. It’s a deep dive into what happens after a simple spark-shell execution and how Spark distributes tasks amongst executors. It is also going to demonstrate the difference between Spark’s local mode and clusters, how stages are created given a Spark user program with Spark shell and UI. It should be as useful for developers as administrators who would like to dig deeper into Apache Spark under the surface of RDD API. The approach is to demonstrate what is behind a simple ”spark-shell -master”, and learning Spark from another non-API perspective. The talk is a sort of a summary of what I learnt about the architecture of Apache Spark from reviewing Spark’s source code and writing the notes at https://jaceklaskowski.gitbooks.io/mastering-apache-spark/.

« back
About Jacek Laskowski

Jacek is an independent consultant who offers development and training services for Apache Spark (and Scala, sbt with a bit of Hadoop YARN, Apache Kafka, Apache Hive, Apache Mesos, Akka Actors/Stream/HTTP, and Docker). He leads Warsaw Scala Enthusiasts and Warsaw Spark meetups. The latest project is to get in-depth understanding of Apache Spark in https://jaceklaskowski.gitbooks.io/mastering-apache-spark/.