Top 10 Apache Spark Blog Posts from 2016
by Jules Damji
December 30, 2016 in Engineering Blog
Spark Summit will be held in Dublin, Ireland on Oct 24-26, 2017. Check out the get your ticket before it sells out!
Here’s our recap of what has transpired with Apache Spark since our previous digest. This digest includes Apache Spark’s top ten 2016 blogs, along with release announcements and other noteworthy events.
Top Ten Apache Spark Blogs
- Apache Spark as a Compiler: Joining a Billion Rows per Second on a Laptop
- A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets
- Introducing Apache Spark Datasets
- Introducing GraphFrames
- Introducing Apache Spark 2.0
- Structured Streaming In Apache Spark
- Apache Spark 2.0 Preview: Machine Learning Model Persistence
- Apache Spark @Scale: A 60 TB+ production use case from Facebook
- Scalable Partition Handling for Cloud-Native Architecture in Apache Spark 2.1
- Deep Learning on Databricks
Releases
- Apache Spark 2.1.0 released, with additional support for Structured Streaming and Apache Kafka 0.10.0. Try it on Databricks.
Webinar
- Joseph Bradley and I presented Apache Spark MLlib 2.x: Migrating ML Workloads to DataFrames, and posted the follow up questions & answers blog.
Events
- Tathagata Das of Databricks presented “Deep Dive in Structured Streaming” at Apache Spark Meetup Lisbon.
What’s Next
To stay abreast with what’s happening with Apache Spark, follow us on Twitter @databricks and visit SparkHub.
Try Databricks for free