Apache Spark™ Under the Hood

Getting started with core architecture and basic concepts

Apache Spark™ has seen immense growth over the past several years, becoming the de-facto data processing and AI engine in enterprises today due to its speed, ease of use, and sophisticated analytics. Spark unifies data and AI by simplifying data preparation at massive scale across various sources, providing a consistent set of APIs for both data engineering and data science workloads, as well as seamless integration with popular AI frameworks and libraries such as TensorFlow, PyTorch, R and SciKit-Learn.

Databricks, founded by the team that originally created Apache Spark, is proud to share excerpts from the book, Spark: The Definitive Guide. Enjoy this free mini-ebook, courtesy of Databricks.

In this eBook, we cover:

  • The past, present, and future of Apache Spark.
  • Basic steps to install and run Spark yourself.
  • A summary of Spark’s core architecture and concepts.
  • Spark’s powerful language APIs and how you can use them.

Get the eBook to learn more.