Skip to main content
Company Blog

Over the years, technical conferences tend to expand beyond their initial focus, adding new technologies, types of attendees, and a broader range of sessions and speakers. From its original focus on Apache Spark™, the Spark + AI Summit has extended its scope to include not only data engineering and infrastructure, but also machine learning automation and applications.

Data, AI, and the Cloud are pillars of advanced data analytics that bring together data visionaries, Spark experts, machine learning developers, data engineers, data scientists, and data analysts to impact innovation at scale, to share novel ideas with the community. With the dawn of a new decade of data, this conference broadly covers topics in data engineering and architecture, business analytics and visualization, data platforms for machine learning, and artificial intelligence industry use cases.

The Spark + AI Summit 2020 features an expansive virtual agenda covering a range of big data and AI subjects for data engineers, data scientists, and data analysts

For the health and safety of our attendees and the larger community, this summer’s conference is now a virtual conference extended over five days. The agenda for this year’s virtual Summit has just been announced, and you’ll see a remarkable range of virtual sessions designed to help attendees learn how to put the latest technologies and techniques into practice. The 190+ presenters, tracks and sessions cover not only open source projects originated by Databricks (Spark, MLflow, Delta Lake, and Koalas) but other important open source technologies, including TensorFlow, PyTorch, the Python data ecosystem, Ray, Presto, Apache Arrow, and Apache Kafka, and more.

New AI, ML and Other Sessions by Theme

Among the companies presenting are Apple, Microsoft, Facebook, Airbnb, Pinterest, Linkedin, Capital One, Netflix, Uber, Adobe, Nvidia, Walmart, Zillow, Paypal, Visa, Target, T-Mobile, Intuit, Atlassian, Comcast, Alibaba, Tencent and, Bytedance. Here are some highlights, clustered by themes:

  • Automation and AI use cases: Learn how to use machine learning and AI technologies to automate workflows, processes, and systems. We have speakers from leading research groups and industry sectors, including IT and software, financial services, retail, logistics, IoT, and media and advertising.
  • Building, deploying, and maintaining data pipelines: As data and machine learning applications become more sophisticated, underlying data pipelines have become harder to build and maintain. We have a series of presentations on best practices and new open source projects aimed directly at helping data engineering teams build and maintain data pipelines using Spark, Delta Lake, and other open source technologies.
  • ML Ops: Over 20 presentations on managing the machine learning development lifecycle, and how to deploy and monitor models once they have been deployed. This is an area where open source projects and best practices are starting to emerge
  • Data management and platforms: In a recent post, we introduced a new data management paradigm - “the lakehouse” - for the age of data, machine learning and AI. This year’s conference will have sessions on lakehouses and deep dives into various open source technologies for data management.
  • Performance and scalability: Over 40 sessions covering aspects of scaling and tuning machine learning models, Spark SQL and Apache Spark 3.0, analytics and data platforms, and end-to-end data applications
  • Open source technologies: We have expanded the program to include dedicated sessions on open source projects in data and machine learning, including a series of technical presentations from contributors of several notable libraries and frameworks.

Come and Join Us

Join the community online and enjoy the camaraderie at Spark + AI Summit 2020. The conference pass is now free, so register now to save your spot! Also check out who is giving keynotes and all the courses offered on two days of expanded training.