Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

tensorflow doing gradients on sparse variable

I am trying to train a sparse variable in tensorflow, As far as I know current tensorflow doesn't allow for sparse variable.

I found two threads discussing similar issue: using-sparsetensor-as-a-trainable-variable and update-only-part-of-the-word-embedding-matrix-in-tensorflow. I am not quitely understand the answer, and it would be good if there is any example code

one way I have tried is:

# initialize the sparse variable sp_weights
# assuming w_s is the input sparse matrix contains indices information
dim=20
identity = tf.constant(np.identity(dim), dtype=tf.float32)
A=tf.sparse_tensor_dense_matmul(w_s, identity)  # convert w_s to dense
w_init = tf.random_normal([dim, dim], mean=0.0, stddev=0.1) 
w_tensor = tf.mul(A, w_init) # random initialize sparse tensor
vars['sp_weights'] = tf.Variable(w_tensor)

# doing some operations...

when compute the gradients, according to the second link using tf.IndexedSlices

grad = opt.compute_gradients(loss)
train_op = opt.apply_gradients(
    [tf.IndexedSlices(grad, indices)]) # indices is extracted from w_s

the above code of course don't work, and I am confused here. tf.IndexedSlices make the input to be IndexedSlices instance, how to use it to update the gradients given the indices? Also, many people mentioned using tf.scatter_add/sub/update. The official document doesn't contain any example code on how to use and where to use for gradient update. should I use tf.IndexedSlices or tf.scatter? it would be much helpful if there is any example code. Thank you!

like image 485
Sufeng Niu Avatar asked Dec 15 '16 04:12

Sufeng Niu


1 Answers

I'm not familiar with IndexedSlices or sparse variables, but what I gather is that you are trying to only apply a gradient update to certain slices of a Variable. If that is what you are doing, then there is an easy workaround: extract a copy of the Variable with

weights_copy = tf.Variable(weights_var.initialized_value()) # Copies the current value

, then apply a gradient update to the entire Variable, and then merge the two using tf.scatter(), merging the original/updated parts wherever you wish.

like image 154
Default picture Avatar answered Nov 18 '22 10:11

Default picture