HomepageData + AI Summit 2022 Logo
Watch on demand

DELETE, UPDATE, MERGE Operations in Data Source V2

On Demand

Type

  • Session

Format

  • Hybrid

Track

  • Data Lakes, Data Warehouses and Data Lakehouses

Difficulty

  • Intermediate

Room

  • Moscone South | Level 2 | 202

Duration

  • 35 min

Überblick

If you’ve ever had to delete a set of records for regulatory compliance, update a set of records to fix an issue in the ingestion pipeline, or apply changes in a transaction log to a fact table, you know that row-level operations are becoming critical for modern data lake workflows. This talk will focus on some of the upcoming features in Spark 3.3 that will enable execution of row-level operations and allow Spark to only pass to connectors what rows to delete, update, or insert. As a result, data sources won’t have to provide low-level SQL extensions for Spark and will be able to benefit from a scalable built-in implementation that works across all connectors. The presentation will be useful for data source developers as well as data engineers and analysts interested in performing DELETE, UPDATE, MERGE operations in Spark.

Session Speakers

Anton Okolnychyi

Software Engineer

Apple

Das Beste des Data+AI Summits anzeigen

Watch on demand