SESSION

Delta Tensor: Efficient Vector and Tensor Storage in Delta Lake

Accept Cookies to Play Video

OVERVIEW

EXPERIENCEIn Person
TYPELightning Talk
TRACKData Engineering and Streaming
TECHNOLOGIESAI/Machine Learning, Delta Lake, Developer Experience
SKILL LEVELIntermediate
DURATION20 min
DOWNLOAD SESSION SLIDES

In machine learning workflows, data are in the format of tensors. Unfortunately, most input data come in various formats and require onerous and inefficient data-loading and storing processes. In this talk, we present Delta Tensor, an approach to store tensor directly in Delta Lake. Besides delegating the data loading to the query engine, Delta Tensor uses chunking to reduce the IO cost of tensor slicing and sparse encoding methods to significantly improve the storage efficiency of sparse tensors, providing an efficient storage and management solution in a cloud-native Lakehouse environment.

SESSION SPEAKERS

Zhiyu Wu

/Student
Northeastern University