Skip to main content

Getting Started with Delta Lake

Making Apache Spark Better with Delta Lake

Series Details

This session is part of the Getting Started with Delta Lake series with Denny Lee and the Delta Lake team.

Session Abstract

Join Michael Armbrust, head of Delta Lake engineering team, to learn about how his team built upon Apache Spark to bring ACID transactions and other data reliability technologies from the data warehouse world to cloud data lakes.

Apache Spark is the dominant processing framework for big data. Delta Lake adds reliability to Spark so your analytics and machine learning initiatives have ready access to quality, reliable data. This webinar covers the use of Delta Lake to enhance data reliability for Spark environments.

Topic areas include:

  • The role of Apache Spark in big data processing
  • Use of data lakes as an important part of the data architecture
  • Data lake reliability challenges
  • How Delta Lake helps provide reliable data for Spark processing
  • Specific improvements improvements that Delta Lake adds
  • The ease of adopting Delta Lake for powering your data lake

What you need:
Sign up for Community Edition here and access the workshop presentation materials and sample notebooks

Michael Armbrust profile image

Michael Armbrust. Principal Software Engineer at Databricks
Michael Armbrust is committer and PMC member of Apache Spark and the original creator of Spark SQL. He currently leads the team at Databricks that designed and built Structured Streaming and Databricks Delta. He received his PhD from UC Berkeley in 2013, and was advised by Michael Franklin, David Patterson, and Armando Fox. His thesis focused on building systems that allow developers to rapidly build scalable interactive applications, and specifically defined the notion of scale independence. His interests broadly include distributed systems, large-scale structured storage and query optimization.

Video Transcript

– [Denny] Hi, everybody. Welcome to our webinar today, Making Apache Spark Better with Delta Lake.

Before we get started with today’s presentation, we wanted to go over a few housekeeping items to ensure that you have the best possible experience. Please note that your audio connections will be muted for the webinar for everyone’s viewing comfort. If you have any concerns or questions, please pose those questions in the question panel or chat. In that panel we encourage you to use this time to ask as many questions and clarify any doubts that you may have on today’s topic. Our key presenter today, Michael Armbrust, is the original creator of Spark SQL and Structured Streaming, and one of the primary creators of Delta Lake. He’s the principal engineer at Databricks, and so without any further delay, take it away Michael. – [Michael] Thank you, Denny. I’m super excited to be here today to talk about how you can make Apache Spark better by using Delta Lake. However, before I jump into that, I wanna start by talking about this concept of a data lake and why so many people are excited with it, and also why there’s a lot of challenges when they try to set these things up as well.