October 23, 2019

Spark + AI in Amsterdam: European Summit Recap, Keynote Videos, & Announcements

Free Edition has replaced Community Edition, offering enhanced features at no cost. Start using Free Edition today.

Spark + AI Summit Europe 2019 came to Amsterdam this past week! Over 2,300 data scientists, data engineers, and global business leaders from 63 different countries descended upon the RAI Amsterdam Convention Centre, for the latest community and open source developments around Apache Spark™, Delta Lake, MLflow, Koalas, and more. Check out the keynote recordings to learn more about the latest announcements and community updates from this sold out event!

Main stage at Spark + AI Summit Europe 2019 in Amsterdam.

Open Source Updates: Delta Lake joins the Linux Foundation, Apache Spark™ 3.0 plans, MLflow Model Registry, and more

At Spark Summit Europe 2019, we learned about some exciting new developments with several open source Apache Spark™ projects that the community has been eagerly awaiting.

First up, we heard from Ali Ghodsi, CEO and Co-founder of Databricks, on the tough problems that data scientists, data engineers and business analysts face head on everyday. In his keynote address, entitled Unified Data Analytics: Helping Data Teams Solve the World’s Toughest Problems, Ali clearly lays out an expansive vision for the future of big data and AI that is not to be missed.

Ali Ghodsi addressing the audience at Spark + AI Summit Europe 2019 in Amsterdam.

Delta Lake joins the Linux Foundation

In his keynote address, New Developments in the Open Source Ecosystem, Principal Software Engineer at Databricks Michael Armbrust shared plans for the continued growth of open source Delta Lake, highlighting the increasingly rapid adoption of this promising technology. Michael was pleased to report that over 3,700 organizations are already using Delta, and more than 2 billion gigabytes (2 exabytes, you read that right) of data are processed with it each month.

Databricks' Michael Armbrust speaking onstage at Spark + AI Summit Europe 2019 in Amsterdam.

To top it off, in a surprise announcement, Michael told the crowd in Amsterdam that Delta Lake is joining the Linux Foundation, to help continue to drive adoption of Delta Lake and growth of the open source community!

Punctuating Michael's point, Senior Software Engineer Burak Yavuz walked the audience through a Delta Lake demo, expertly showcasing Delta's capability and power.

Burak Yavuz demos Delta Lake onstage at Spark Summit Europe 2019 in Amsterdam.

Apache Spark™ 3.0 upcoming enhancements

In addition to the exciting news about Delta Lake, Michael also shared several new developments about the upcoming release of Apache Spark™ 3.0, including significant performance improvements that are coming to the Spark SQL Optimizer. These improvements, which include partition pruning and the clever use of broadcast joins for certain merge operations, can provide up to 17x performance improvements for some queries. Finally, Michael introduced a new federated data catalog to Spark.

MLflow Model Registry update

Not to be outdone, Chief Technologist and Co-founder of Databricks Matei Zaharia discussed the importance of the new Model Registry to the MLflow ecosystem. In his keynote address to the crowd at RAI Amsterdam Convention Centre entitled Simplifying Model Management with MLflow, the original creator of Apache Spark™ explained how the MLflow Model Registry allows data teams to organize and productionize different versions of machine learning models, by offering a collaborative repository where named ML models can be saved and versioned.

Databricks CTO and Co-founder Matei Zaharia presents to the crowd in Amsterdam at Spark + AI Summit Europe 2019.

The Model Registry also makes it possible for data engineers and data scientists to implement flexible CI/CD pipelines, which Databricks Software Engineer Corey Zumar was kind enough to demo for the crowd. Learn more about the MLflow Model Registry here.

Corey Zumar speaks to a packed house at AI Summit Europe 2019 in Amsterdam.

Koalas community growth and adoption

We also heard from Databricks' own Principal Consultant Brooke Wenig on the continuing success of the Koalas open source project, which aims to bring the power of Apache Spark™ to pandas, the popular Python data analysis library. Open source community members have downloaded Koalas over 10,000 times per day, and the project has experienced over 100% month-over-month growth since its inception. We were also treated to a live demonstration, showing how easy it is to transition from single node data science on pandas to multi node data science on Spark using Koalas. Learn more about this exciting open source project here.

Brooke Wenig speaks behind a podium onstage at Spark Summit Europe 2019 in Amsterdam.

Keynotes: Katie Bouman on creating the first black hole image, Gaël Varoquaux on the "secret weapon" of scikit-learn's success, and much more

This year's Spark + AI Summit Europe featured a keynote speech from none other than Katie Bouman, Assistant Professor of Computing and Mathematical Sciences at Caltech. In her keynote speech, Imaging the Unseen: Taking the First Picture of a Black Hole, Katie shared with us the process that she and her team used to photograph the celestial majesty from the Event Horizon Telescope in space.

Katie Bouman presents to the crowd onstage at Databricks' Spark Summit 2019 in Amsterdam.

We were also lucky enough to hear from Gaël Varoquaux, Creator of scikit-learn and Faculty Researcher at Inria. In Gaël's keynote, Democratizing Machine Learning: Perspective From a scikit-learn Creator, he explained the simple principles, including one that he calls a "secret weapon," that have been key to scikit-learn's runaway success.

Gaël Varoquaux speaking onstage in front of microphones at the Amsterdam Spark Summit 2019.

Oriol Vinyals, Principal Scientist at Google DeepMind and former member of the Google Brain team, shared a fascinating story with us in his talk Project AlphaStar: mastering the real-time strategy game StarCraft II with AI.

Oriol Vinyals speaks behind the podium onstage, which reads

Customer Keynotes

The lively audience in Amsterdam was also treated to talks from other legends and luminaries, including:

Alessio Basso, PayMe/HSBC, on reinventing payments at HSBC with a unified platform for data and AI in the cloud
Johan Vallin, Electrolux, on forecasting 'what-if' scenarios in retail using ML-powered interactive tools
Dr. Stephen Galsworthy, Quby, on saving energy in homes with a unified approach to data and AI
Mark Hamilton and Christina Lee, Microsoft, on Microsoft's AI For Good Initiative

A view of the convention center theater from the balcony above, with attendees from Spark + AI Summit Europe in nearly every seat.

Women in Unified Analytics: Panel discussions and networking at Spark + AI in Amsterdam

This year’s Spark Summit also gathered Women in Unified Analytics and allies, providing an opportunity for women in big data, data science, machine learning and AI to connect, learn, and network. Female leaders from Microsoft, Lloyds Bank, Wehkamp, Centrum Wiskunde & Informatica, ARM, Depop, and more, met together for tech talks and panel discussion. They covered topics including technology trends, diversity & inclusion, ethical AI, and career development for women.

Photo of four women from the Women in Unified Analytics group at the Spark + AI Summit.

Spark Summit Europe Community Sessions: Spark tuning workshops, technical tutorials, big data case studies, and more

In addition to the Keynote presentations, at this year's Spark + AI Summit Europe, attendees were treated to over 140 different community sessions and instructor-led trainings. These community sessions featured speakers from companies like: KTH, Socialbakers, Airbnb, Eventbrite, Getyourguide, H&M, CERN, La Poste, Klario, Facebook, Societe Generale, Canal+, Nielsen, and more. These technical sessions covered all sorts of use cases and best practices, with hands-on tutorials on topics including deep learning, structured streaming, Apache Spark™ tuning, Delta Lake, MLflow, and more.

An attendee from the audience speaks into a microphone with other attendees all around him at a Spark + AI Summit training session.

What's Next for Spark + AI Summit

Spark + AI Summit Europe 2019 keynote videos are now available! To see the newest product announcements and thought leadership, follow @Databricks on Twitter or subscribe to our newsletter. You can also learn Apache Spark, Delta Lake, and MLflow today on our free Databricks Community Edition, or build a production data application by trying Databricks today for free.

Three attendees pose in front of large orange physical letters that spell out

As always, thanks for your support, and we look forward to seeing you again stateside, at the upcoming Spark + AI Summit in San Francisco on June 23-25, 2020!