Data Analysis Workshop Series
Introduction to Apache Spark


Workshop Details
This workshop is the final part in our Introduction to Data Analysis for Aspiring Data Scientists Workshop Series.
This workshop covers the fundamentals of Apache Spark, the most popular big data processing engine. In this workshop, you will learn how to ingest data with Spark, analyze the Spark UI, and gain a better understanding of distributed computing. We will be using data released by the NY Times (https://github.com/nytimes/covid-19-data). No prior knowledge of Spark is required, but Python experience is highly recommended.
What you need: