Skip to main content

What is a Sparse Tensor?

A memory-efficient tensor that stores only non-zero values with their indices and shape, optimizing storage and computation for sparse datasets

by Databricks Staff

  • A sparse tensor is a way of representing data where most values are zero by storing only the non zero entries and their positions instead of every element in the array.
  • This design can dramatically cut memory usage and computation for very large, mostly empty datasets such as text embeddings, recommendation matrices or scientific measurements.
  • Sparse tensor libraries provide functions to convert between sparse and dense forms, count non zero values and reshape or permute dimensions so developers can work efficiently with high dimensional data.

Python offers an inbuilt library called numpy to manipulate multi-dimensional arrays. The organization and use of this library is a primary requirement for developing the pytensor library. Sparse Tensor Sptensor is a class that represents the sparse tensor. A sparse tensor is a dataset in which most of the entries are zero, one such example would be a large diagonal matrix. (which has many zero elements). It does not store the whole values of the tensor object but stores the non-zero values and the corresponding coordinates of them. Sparse tensor storage formats allow us to only store non-zero values thereby reducing storage requirements and eliminating unnecessary silent computations involving zero values. Here are its main attributes:

  • vals (numpy.ndarray) A 1-dimensional array of non-zero values of the sparse tensor.
  • subs (numpy.ndarray) A 2-dimensional array of coordinates of the values in vals.
  • shape(tuple)
REPORT

The agentic AI playbook for the enterprise

The shape of the sparse tensor.

  • func(binary operator) This function is used to construct the sparse tensor as an accumulator.

On top of that, its main functions are:

  • __init__(self, subs, vals, shape = None, func=sum.__call__) Consturctor for the sptensor class. subs and vals (numpy.ndarray) or (list) are coordinates and values of the sptensor.
  • tondarray(self) This function returns a numpy. ndarray object that has the same values with the sptensor.
  • permute(self, order) By applying this function it will return the sptensor object that is permuted by the given order (list).
  • ipermute(self, order) Returns the sptensor object that is permuted by the inverse of the given order (list).
  • copy(self) Returns the copied sptensor object of the sptensor.
  • totensor(self) Returns the tensor object that has the same values with the sptensor.
  • nnz(self) Returns the number of non-zero elements in the sptensor.
  • ndims(self) Returns the number of dimensions of the tensor.
  • dimsize(self, ind)
  • Returns the size of the specified dimension. Same as shape[ind].

Additional Resources

Get the latest posts in your inbox

Subscribe to our blog and get the latest posts delivered to your inbox.