On a rainy and foggy Saturday morning, April 30th, in McLean, VA., more than 275 Apache Spark enthusiasts, forsaking the comfort of Saturday sleep-in, eagerly lined up at 8:00 am to register for an all-day inaugural Spark Saturday DC Meetup at Capital One’s headquarters. That is an affirmation that Apache Spark’s fervor is tangible, its hunger insatiable, and its popularity uncontested.
A Databricks t-shirt reads: “May the Spark with you.” So it was for the entire day—and so was the voice of its “Father.”
Here are some highlights, broken down into architecture, Spark use-cases, and miscellaneous.
- In his keynote, Chris D’Agostino, vice president of technology at Capital One, shared his vision of cloud architecture and how his team is using Apache Spark in the cloud.
- Databricks’ Vida Ha’s talk struck a chord with practitioners, with her presentation on “Not your Father’s Database: How to use Apache Spark Properly in your Big Data Architecture,” followed by questions about Spark 2.0.
Machine Learning, Data Science & Spark Use Cases
- For his mesmerizing talk on Neuroscience, Jeremy Freeman, from HMMI |Janelia Research Campus, alluded to a Databricks' blog on k-means (Introducing streaming k-means in Spark 1.2) and how Spark is being used for analytics in Neuroscience—understanding behavioral patterns by analyzing neural activity and mapping neurons.
- Alexis Seigneurin, big data engineer at IpponUSA, talked about Machine Learning for Record Linkage, using a Spark ML use case, and enumerated the Do’s and Don’t of SparkContext, DataFrames, Datasets and RDDs.
- Michal Malohvala, software engineer at H2O.ai, gave us a flavor of H2O.ai Sparkling Water.
- Hollings Wilkins and Mikhail Semenuik from TrueCar jointly showcased MLeap, an open-source Spark Package to easily deploy Spark ML pipelines into production.
- Saurabh Gupte from Capital One shared how to use Spark framework for rapid use-case development for devices, with a live demo, interacting with the audience using their smartphones.
- A speakers’ panel, moderated by Donna Fernandez of MetiStream, answered a wide variety of questions from the audience—from Apache Spark 2.0, Machine Learning and Data Science roadmap to trends in Spark Streaming.
- Denny Lee of Databricks received a well-deserved accolade from the Washington DC Area Spark Interactive Meetup.
- During lunch hour, close to 100 attended an hour’s hands-on-the-deck training, conducted by certified MetiStream trainers. Attendees used Databricks Community Edition for an Introduction to Apache Spark.
In summation, Spark Saturday DC was a fun worth-while community event. Attendees have posted nothing but complimentary comments on the Meetup page and on the @SparkSaturdayDC/#SparkSaturdayDC.
Again, we want to thank our hosts and one of the organizers Capital One and their team for executing this community event, along with MetiStream and everyone who volunteered and attended for making this inaugural event a huge success.
We are planning with other Spark Meetup Organizers for our next Spark Saturday in other cities. Stay tuned.