I'm running into an issue where chaining tf.gather()
indexing produces the following warning:
/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gradients.py:90: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
The scenario arises when one layer indexes into the input layer, performs some operation on the corresponding slice, and then the next layer indexes into the result. Here's a representative example:
import tensorflow as tf
## 10-Dimensional data will be fed to the model
X = tf.placeholder( tf.float32, [10, None] )
## W works with the first 3 features of a sample
W = tf.Variable( tf.ones( [5, 3] ) )
Xi = tf.gather( X, [0,1,2] )
mm = tf.matmul( W, Xi )
## Indexing into the result produces a warning during backprop
h = tf.gather( mm, [0,1] )
...
train_step = tf.train.AdamOptimizer(1e-4).minimize( loss )
The warning arises upon definition of train_step
and goes away if the second tf.gather()
call is taken away. The warning also goes away if X
is provided with an explicit number of samples (e.g., [10, 1000]
).
Thoughts?
The gradient function of the tf.gather
operation returns IndexedSlices
typed value. In your program, the input the second tf.gather
is the result of a tf.matmul
(mm
). Consequently, the gradient function for matrix multiply is passed an IndexedSlices
value.
Now, imagine what the gradient function for tf.matmul
needs to do. To compute the gradient w.r.t W
, it has to multiply the incoming gradients with the transpose of Xi
. In this case, the incoming gradients is a IndexedSlices
type, and Xi
's transpose is a dense tensor (Tensor
) type. TensorFlow doesn't have an implementation of matrix multiply that can operate on IndexedSlices
and Tensor
. So it simply converts the IndexedSlices
to a Tensor
before calling tf.matmul
.
If you look at the code for that conversion function here, you'll notice that it prints out a warning when this sparse to dense conversion might result in either a very large dense tensor (_LARGE_SPARSE_NUM_ELEMENTS
determines how large), or a dense tensor of unknown size. When you shape your placeholder X
with shape [10, None]
, this conversion happens on a IndexedSlices
with unknown shape (really, only one of the dimension is unknown, but still it's not possible to determine the resulting shape statically), hence you see the warning printed out. Once you set the shape of X
to [10, 1000]
, the shape of IndexedSlices
becomes fully specified, AND the resulting dense tensor size is within the threshold, so you don't see the warning printed out.
For your computation, if you simply cannot avoid the tf.gather
on the result of a tf.matmul
, then I would worry about this warning too much, unless the number of columns in X
is extremely large.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With