New Foundations of Delta Lake with Kernel and Spark's Data Source V2
Overview
| Experience | In Person |
|---|---|
| Track | Analytics & BI |
| Industry | Enterprise Technology |
| Technologies | AI/BI |
| Skill Level | Intermediate |
When the Delta Lake project started back in 2018, Spark had the very first version of its Data Source APIs that allowed Spark to access process data in external sources (like Cassandra, Kafka etc). These Data Source V1 (DSv1) APIs, was limited to only basic features to allow communicate the data from source to Spark leaving a LOT of the heavy lifting (stats based data filtering, advanced upsert support, etc.) for the data source implementation to figure out. Delta Lake had build all the necessary intelligence to provide users a fantastic, high-performance, intuitive experience and command support that had set the standard for last 8 years. Now Spark as a engine has caught up by building a lot more intelligence within the engine itself and is now capable of doing a lot more of heavy lifting on behalf of data sources. These are the Data Source V2 (DSv2) APIs, and in this talk we are going to explore how we updating Delta to use DSv2 to build new foundations for the next decade of Delta.
Session Speakers
Rahul Potharaju
/Director of Engineering, Storage
Databricks
Tathagata Das
/Sr. Staff Software Engineer
Databricks