Spark + AI Summit Europe 2019 came to Amsterdam this past week! Over 2,300 data scientists, data engineers, and global business leaders from 63 different countries descended upon the RAI Amsterdam Convention Centre, for the latest community and open source developments around Apache Spark™, Delta Lake, MLflow, Koalas, and more. Check out the keynote recordings to learn more about the latest announcements and community updates from this sold out event!
Open Source Updates: Delta Lake joins the Linux Foundation, Apache Spark™ 3.0 plans, MLflow Model Registry, and more
At Spark Summit Europe 2019, we learned about some exciting new developments with several open source Apache Spark™ projects that the community has been eagerly awaiting.
First up, we heard from Ali Ghodsi, CEO and Co-founder of Databricks, on the tough problems that data scientists, data engineers and business analysts face head on everyday. In his keynote address, entitled Unified Data Analytics: Helping Data Teams Solve the World’s Toughest Problems, Ali clearly lays out an expansive vision for the future of big data and AI that is not to be missed.
Delta Lake joins the Linux Foundation
In his keynote address, New Developments in the Open Source Ecosystem, Principal Software Engineer at Databricks Michael Armbrust shared plans for the continued growth of open source Delta Lake, highlighting the increasingly rapid adoption of this promising technology. Michael was pleased to report that over 3,700 organizations are already using Delta, and more than 2 billion gigabytes (2 exabytes, you read that right) of data are processed with it each month.
To top it off, in a surprise announcement, Michael told the crowd in Amsterdam that Delta Lake is joining the Linux Foundation, to help continue to drive adoption of Delta Lake and growth of the open source community!
Punctuating Michael’s point, Senior Software Engineer Burak Yavuz walked the audience through a Delta Lake demo, expertly showcasing Delta’s capability and power.
Apache Spark™ 3.0 upcoming enhancements
In addition to the exciting news about Delta Lake, Michael also shared several new developments about the upcoming release of Apache Spark™ 3.0, including significant performance improvements that are coming to the Spark SQL Optimizer. These improvements, which include partition pruning and the clever use of broadcast joins for certain merge operations, can provide up to 17x performance improvements for some queries. Finally, Michael introduced a new federated data catalog to Spark.
MLflow Model Registry update
Not to be outdone, Chief Technologist and Co-founder of Databricks Matei Zaharia discussed the importance of the new Model Registry to the MLflow ecosystem. In his keynote address to the crowd at RAI Amsterdam Convention Centre entitled Simplifying Model Management with MLflow, the original creator of Apache Spark™ explained how the MLflow Model Registry allows data teams to organize and productionize different versions of machine learning models, by offering a collaborative repository where named ML models can be saved and versioned.
The Model Registry also makes it possible for data engineers and data scientists to implement flexible CI/CD pipelines, which Databricks Software Engineer Corey Zumar was kind enough to demo for the crowd. Learn more about the MLflow Model Registry here.
Koalas community growth and adoption
We also heard from Databricks’ own Principal Consultant Brooke Wenig on the continuing success of the Koalas open source project, which aims to bring the power of Apache Spark™ to pandas, the popular Python data analysis library. Open source community members have downloaded Koalas over 10,000 times per day, and the project has experienced over 100% month-over-month growth since its inception. We were also treated to a live demonstration, showing how easy it is to transition from single node data science on pandas to multi node data science on Spark using Koalas. Learn more about this exciting open source project here.
Keynotes: Katie Bouman on creating the first black hole image, Gaël Varoquaux on the “secret weapon” of scikit-learn’s success, and much more
This year’s Spark + AI Summit Europe featured a keynote speech from none other than Katie Bouman, Assistant Professor of Computing and Mathematical Sciences at Caltech. In her keynote speech, Imaging the Unseen: Taking the First Picture of a Black Hole, Katie shared with us the process that she and her team used to photograph the celestial majesty from the Event Horizon Telescope in space.
We were also lucky enough to hear from Gaël Varoquaux, Creator of scikit-learn and Faculty Researcher at Inria. In Gaël’s keynote, Democratizing Machine Learning: Perspective From a scikit-learn Creator, he explained the simple principles, including one that he calls a “secret weapon,” that have been key to scikit-learn’s runaway success.
Oriol Vinyals, Principal Scientist at Google DeepMind and former member of the Google Brain team, shared a fascinating story with us in his talk Project AlphaStar: mastering the real-time strategy game StarCraft II with AI.
The lively audience in Amsterdam was also treated to talks from other legends and luminaries, including:
- Alessio Basso, PayMe/HSBC, on reinventing payments at HSBC with a unified platform for data and AI in the cloud
- Johan Vallin, Electrolux, on forecasting ‘what-if’ scenarios in retail using ML-powered interactive tools
- Dr. Stephen Galsworthy, Quby, on saving energy in homes with a unified approach to data and AI
- Mark Hamilton and Christina Lee, Microsoft, on Microsoft’s AI For Good Initiative
Women in Unified Analytics: Panel discussions and networking at Spark + AI in Amsterdam
This year’s Spark Summit also gathered Women in Unified Analytics and allies, providing an opportunity for women in big data, data science, machine learning and AI to connect, learn, and network. Female leaders from Microsoft, Lloyds Bank, Wehkamp, Centrum Wiskunde & Informatica, ARM, Depop, and more, met together for tech talks and panel discussion. They covered topics including technology trends, diversity & inclusion, ethical AI, and career development for women.
Spark Summit Europe Community Sessions: Spark tuning workshops, technical tutorials, big data case studies, and more
In addition to the Keynote presentations, at this year’s Spark + AI Summit Europe, attendees were treated to over 140 different community sessions and instructor-led trainings. These community sessions featured speakers from companies like: KTH, Socialbakers, Airbnb, Eventbrite, Getyourguide, H&M, CERN, La Poste, Klario, Facebook, Societe Generale, Canal+, Nielsen, and more. These technical sessions covered all sorts of use cases and best practices, with hands-on tutorials on topics including deep learning, structured streaming, Apache Spark™ tuning, Delta Lake, MLflow, and more.
What’s Next for Spark + AI Summit
Spark + AI Summit Europe 2019 keynote videos are now available! To see the newest product announcements and thought leadership, follow @Databricks on Twitter or subscribe to our newsletter. You can also learn Apache Spark, Delta Lake, and MLflow today on our free Databricks Community Edition, or build a production data application by trying Databricks today for free.
As always, thanks for your support, and we look forward to seeing you again stateside, at the upcoming Spark + AI Summit in San Francisco on June 23-25, 2020!