Session

Spark DSV2: Growing Up Fast

Overview

Experience	In Person
Track	Data Engineering & Streaming
Industry	Enterprise Technology
Technologies	Databricks SQL
Skill Level	Intermediate

Spark’s DataSource V2 integration has taken a major step forward, beginning with the addition of a procedure catalog and row identifier support to enable richer table management and row-level operations across modern data sources. Building on this foundation, recent updates improve MERGE INTO with safer schema evolution and enhance partition filtering for more efficient query planning and execution. DML summaries now provide clearer visibility into write behavior, while key cache fixes resolve long-standing correctness issues in Spark execution. DataSource V2 has also been extended with first-class SQL features such as table constraints, complex default values, and generated columns, laying the groundwork for more advanced table semantics. We’ll also look ahead to deeper Change Data Feed (CDF) integration in Spark to support robust incremental processing, highlighting what’s available today and what’s coming next as Spark continues to close the gap with traditional warehouse systems.

Session Speakers

IMAGE COMING SOON

Anton Okolnychyi

/Senior Staff Software Engineer
Databricks

Szehon Ho

/Software Engineer
Databricks