This self-paced guide is the “Hello World” tutorial for Apache Spark using Databricks. In the following tutorial modules, you will learn the basics of creating Spark jobs, loading data, and working with data. You’ll also get an introduction to running machine learning algorithms and working with streaming data. Databricks lets you start writing Spark queries instantly so you can focus on your data problems.
Hover over the above navigation bar and you will see the six stages to getting started with Apache Spark on Databricks. This guide will first provide a quick start on how to use open source Apache Spark and then leverage this knowledge to learn how to use Spark DataFrames with Spark SQL. We also will discuss how to use Datasets and how DataFrames and Datasets are now unified. The guide also has quick starts for Machine Learning and Streaming so you can easily apply them to your data problems. Each of these modules refers to standalone usage scenarios—including IoT and home sales—with notebooks and datasets so you can jump ahead if you feel comfortable.
Apache Spark is a powerful open-source processing engine built around speed, ease of use, and sophisticated analytics.
“At Databricks, we’re working hard to make Spark easier to use and run than ever, through our efforts on both the Spark codebase and support materials around it. All of our work on Spark is open source and goes directly to Apache.”
Matei Zaharia, VP, Apache Spark,
Co-founder & Chief Technologist, Databricks
For more information about Spark, you can also reference:
Databricks is a Unified Analytics Platform on top of Apache Spark that accelerates innovation by unifying data science, engineering and business. With our fully managed Spark clusters in the cloud, you can easily provision clusters with just a few clicks. Databricks incorporates an integrated workspace for exploration and visualization so users can learn, work, and collaborate in a single, easy to use environment. You can easily schedule any existing notebook or locally developed Spark code to go from prototype to production without re-engineering.Sign up Today
In addition, Databricks includes:
Find all of our available courses here at https://academy.databricks.com
This series of tech talk tutorials takes you through the technology foundation of Delta Lake (Apache Spark) and the capabilities Delta Lake adds to it to power cloud data lakes.