SESSION

Towards Multi-Statement Transactions in Delta Lake

OVERVIEW

EXPERIENCEIn Person
TYPELightning Talk
TRACKData Lakehouse Architecture
INDUSTRYHealth and Life Sciences, Travel and Hospitality, Financial Services
TECHNOLOGIESDelta Lake, Governance
SKILL LEVELIntermediate
DURATION20 min

This talk discusses managed-commits, a new commit protocol for Delta Lake that changes the source of commit atomicity from the object store to an external commit owner (e.g., HMS/Unity Catalog/Glue) that will help us provide flexibility in how transactions are performed, laying out the foundation for advanced features such as multi-statement transactions. Delta was originally built on the premise that cloud storage is the source of truth. However, cloud storage has limited primitives for atomicity; more specifically, object stores lack the means to perform atomic commits for more than a single write/statement. In this talk, we talk about the new commit protocol, managed-commits, that aims to solve the following:

 

  • Support multi-table-multi-statement transactions.
  • Provide reliable commit semantics even when the underlying object store lacks put-if-absent semantics (e.g., S3).
  • Data governance overwrite operations.

SESSION SPEAKERS

Prakhar Jain

/Staff Software Engineer
Databricks