Join us for a four-part learning series: Introduction to Data Analysis for Aspiring Data Scientists. This self-paced online workshop series is for anyone and everyone interested in learning about data analysis. No previous programming experience required.
Each workshop page contains the session video recording, transcripts, speaker info, and a GitHub link to access the notebooks and resources. We suggest you start with Part One, Introduction to Python, and continue from there in order because each workshop builds upon the last.
In this workshop, we will show you the simple steps needed to program in Python using a notebook environment on the free Databricks Community Edition.This workshop covers major foundational concepts necessary for you to start coding in Python, with a focus on data analysis. No prior programming knowledge is required.
This workshop is on pandas, a powerful open-source Python package for data analysis and manipulation. In this workshop, you will learn how to read data, compute summary statistics, check data distributions, conduct basic data cleaning and transformation, and plot simple visualizations. Although no prep work is required, we do recommend basic python knowledge. Watch Part One, Introduction to Python to learn about Python.
scikit-learn is one of the most popular open-source machine learning libraries among data science practitioners. This workshop will walk through what machine learning is, the different types of machine learning, and how to build a simple machine learning model. This workshop focuses on the techniques of applying and evaluating machine learning methods, rather than the statistical concepts behind them.
This workshop covers the fundamentals of Apache Spark, the most popular big data processing engine. In this workshop, you will learn how to ingest data with Spark, analyze the Spark UI, and gain a better understanding of distributed computing. No prior knowledge of Spark is required, but Python experience is highly recommended.