Diving Into Delta Lake

Tech Talk Series


Join us for a three-part learning series: Diving Into Delta Lake. This series of tech talks takes you through the internals of Delta Lake, a popular open source technology enabling ACID transactions, time travel, schema enforcement and more on top of your data lakes. The Delta Lake engineering team, including Burak Yavuz, Andrea Neumann, Tathagata “TD” Das and Developer Advocate Denny Lee are your guides for this dive.

Many of the workshops include notebooks and links to slides for you to download.

If you’d like to follow along, please Sign Up for your free Community Edition account or download the Delta Lake library.

Just getting started? Check out our Getting Started with Delta Lake tech talk series.

Unpacking the Transaction Log

The transaction log is key to understanding Delta Lake because it is the common thread that runs through many of its most important features, including ACID transactions, scalable metadata handling, time travel, and more. In this session, we’ll explore what the Delta Lake transaction log is, how it works at the file level, and how it offers an elegant solution to the problem of multiple concurrent reads and writes.

Enforcing and Evolving the Schema

As business problems and requirements evolve over time, so too does the structure of your data. With Delta Lake, as the data changes, incorporating new dimensions is easy. Users have access to simple semantics to control the schema of their tables, including enforcement and evolution.

DML Internals: Delete, Update, Merge

In this session, we will dive deeper into how commits, snapshot isolation, and partition and files change when performing deletes, updates, merges, and structured streaming.

Just Getting Started?

This series of tech talk tutorials takes you through the technology foundation of Delta Lake (Apache Spark) and the capabilities Delta Lake adds to it to power cloud data lakes.

Jetzt ansehen