Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow: chaining tf.gather() produces IndexedSlices warning

I'm running into an issue where chaining tf.gather() indexing produces the following warning:

/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients.py:90: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.           
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

The scenario arises when one layer indexes into the input layer, performs some operation on the corresponding slice, and then the next layer indexes into the result. Here's a representative example:

import tensorflow as tf

## 10-Dimensional data will be fed to the model
X = tf.placeholder( tf.float32, [10, None] )

## W works with the first 3 features of a sample
W = tf.Variable( tf.ones( [5, 3] ) )
Xi = tf.gather( X, [0,1,2] )
mm = tf.matmul( W, Xi )

## Indexing into the result produces a warning during backprop
h = tf.gather( mm, [0,1] )
...
train_step = tf.train.AdamOptimizer(1e-4).minimize( loss )

The warning arises upon definition of train_step and goes away if the second tf.gather() call is taken away. The warning also goes away if X is provided with an explicit number of samples (e.g., [10, 1000]).

Thoughts?

like image 259
Artem Sokolov Avatar asked Jan 06 '23 11:01

Artem Sokolov


1 Answers

The gradient function of the tf.gather operation returns IndexedSlices typed value. In your program, the input the second tf.gather is the result of a tf.matmul (mm). Consequently, the gradient function for matrix multiply is passed an IndexedSlices value.

Now, imagine what the gradient function for tf.matmul needs to do. To compute the gradient w.r.t W, it has to multiply the incoming gradients with the transpose of Xi. In this case, the incoming gradients is a IndexedSlices type, and Xi's transpose is a dense tensor (Tensor) type. TensorFlow doesn't have an implementation of matrix multiply that can operate on IndexedSlices and Tensor. So it simply converts the IndexedSlices to a Tensor before calling tf.matmul.

If you look at the code for that conversion function here, you'll notice that it prints out a warning when this sparse to dense conversion might result in either a very large dense tensor (_LARGE_SPARSE_NUM_ELEMENTS determines how large), or a dense tensor of unknown size. When you shape your placeholder X with shape [10, None], this conversion happens on a IndexedSlices with unknown shape (really, only one of the dimension is unknown, but still it's not possible to determine the resulting shape statically), hence you see the warning printed out. Once you set the shape of X to [10, 1000], the shape of IndexedSlices becomes fully specified, AND the resulting dense tensor size is within the threshold, so you don't see the warning printed out.

For your computation, if you simply cannot avoid the tf.gather on the result of a tf.matmul, then I would worry about this warning too much, unless the number of columns in X is extremely large.

like image 120
keveman Avatar answered Jan 13 '23 09:01

keveman