I am having troubles understanding the meaning and usages for Tensorflow Tensors and Sparse Tensors.
According to the documentation
Tensor
Tensor is a typed multi-dimensional array. For example, you can represent a mini-batch of images as a 4-D array of floating point numbers with dimensions [batch, height, width, channels].
Sparse Tensor
TensorFlow represents a sparse tensor as three separate dense tensors: indices, values, and shape. In Python, the three tensors are collected into a SparseTensor class for ease of use. If you have separate indices, values, and shape tensors, wrap them in a SparseTensor object before passing to the ops below.
My understandings are Tensors are used for operations, input and output. And Sparse Tensor is just another representation of a Tensor(dense?). Hope someone can further explain the differences, and the use cases for them.
A sparse tensor is a dataset in which most of the entries are zero, one such example would be a large diagonal matrix. (which has many zero elements). It does not store the whole values of the tensor object but stores the non-zero values and the corresponding coordinates of them.
There are four main tensor type you can create: tf. Variable.
Dense tensors store values in a contiguous sequential block of memory where all values are represented. Tensors or multi-dimensional arrays are used in a diverse set of multi-dimensional data analysis applications.
Pytorch implements an extension of sparse tensors with scalar values to sparse tensors with (contiguous) tensor values. Such tensors are called hybrid tensors.
Matthew did a great job but I would love to give an example to shed more light on Sparse tensors with a example.
If a tensor has lots of values that are zero, it can be called sparse.
Lets consider a sparse 1-D Tensor
[0, 7, 0, 0, 8, 0, 0, 0, 0]
A sparse representation of the same tensor will focus only on the non-zero values
values = [7,8]
We also have to remember where those values occurs, by their indices
indices = [1,4]
The one-dimensional indices form will work with some methods, for this one-dimensional example, but in general indices have multiple dimensions, so it will be more consistent (and work everywhere) to represent indices like this:
indices = [[1], [4]]
With values and indices, we don't have quite enough information yet. How many zeros are there? We represent dense shape of a tensor.
dense_shape = [9]
These three things together, values, indices, and dense_shape, are a sparse representation of the tensor
In tensorflow 2.0 it can be implemented as
x = tf.SparseTensor(values=[7,8],indices=[[1],[4]],dense_shape=[9])
x
#o/p: <tensorflow.python.framework.sparse_tensor.SparseTensor at 0x7ff04a58c4a8>
print(x.values)
print(x.dense_shape)
print(x.indices)
#o/p:
tf.Tensor([7 8], shape=(2,), dtype=int32)
tf.Tensor([9], shape=(1,), dtype=int64)
tf.Tensor(
[[1]
[4]], shape=(2, 1), dtype=int64)
EDITED to correct indices as pointed out in the comments.
The difference involves computational speed. If a large tensor has many, many zeroes, it's faster to perform computation by iterating through the non-zero elements. Therefore, you should store the data in a SparseTensor and use the special operations for SparseTensors.
The relationship is similar for matrices and sparse matrices. Sparse matrices are common in dynamic systems, and mathematicians have developed many special methods for operating on them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With