SESSION

Variant Data Type - Making Semi-Structured Data Fast and Simple

Accept Cookies to Play Video

OVERVIEW

EXPERIENCEIn Person
TYPEBreakout
TRACKData Engineering and Streaming
TECHNOLOGIESDatabricks Experience (DBX), Delta Lake, Developer Experience
SKILL LEVELIntermediate
DURATION40 min
DOWNLOAD SESSION SLIDES

The lakehouse architecture is able to store and process a wide variety of data, including semi-structured data, like JSON. One of the main benefits of semi-structured data is the flexibility of schema evolution, meaning the schema does not need to be pre-defined. However, for warehousing applications, dealing with evolving, semi-structured data can sometimes be a challenge. Alternatively, simply representing semi-structured data as a string is very flexible, but parsing strings can greatly affect performance.

 

We introduce the Variant data type, to make semi-structured data processing fast and simple. The Variant data type stores semi-structured data in a flexible way, without having to pre-define a schema. The Variant binary encoding also allows processing the data much faster than parsing strings.

 

In this session, we will introduce the Variant data type, present the details of the Variant binary encoding, and show the benefits of Variant with performance results.

SESSION SPEAKERS

Gene Pang

/Software Engineer
Databricks

Chenhao Li

/Software Engineer
Databricks