Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Embedding vectors not being updated when using Tensorflow on window classification

I am trying to implement a window based classifier with tensorflow,

The word embedding matrix is called word_vec and is initialized randomly (I tried Xavier also).

And the ind variable is the a vector of the indices of the word vectors from the matrix.

The first layer is config['window_size'] (5) word vectors concatenated.

word_vecs = tf.Variable(tf.random_uniform([len(words), config['embed_size']], -1.0, 1.0),dtype=tf.float32)
ind = tf.placeholder(tf.int32,  [None, config['window_size']])
x = tf.concat(1,tf.unpack(tf.nn.embedding_lookup(word_vecs, ind),axis=1))
W0 = tf.Variable(tf.random_uniform([config['window_size']*config['embed_size'], config['hidden_layer']]))
b0 = tf.Variable(tf.zeros([config['hidden_layer']]))
W1 = tf.Variable(tf.random_uniform([config['hidden_layer'], out_layer]))
b1 = tf.Variable(tf.zeros([out_layer]))
y0 = tf.nn.tanh(tf.matmul(x, W0) + b0)
y1 = tf.nn.softmax(tf.matmul(y0, W1) + b1)
y_ = tf.placeholder(tf.float32, [None, out_layer])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y1), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(0.5).minimize(cross_entropy)

And this is how I run the graph:

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(config['iterations'] ):
    r = random.randint(0,len(sentences)-1)
    inds=generate_windows([w for w,t in sentences[r]])
    #inds now contains an array of n rows on window_size columns
    ys=[one_hot(tags.index(t),len(tags)) for w,t in sentences[r]]
    #ys now contains an array of n rows on output_size columns
    sess.run(train_step, feed_dict={ind: inds, y_: ys})

The dimensions work out, and the code runs

However, the accuracy is near zero, and I suspect that the the word vectors aren't being updated properly.

How can I make tensorflow update the word vectors back from the concatenated window form ?

like image 326
Uri Goren Avatar asked Nov 23 '25 21:11

Uri Goren


1 Answers

Your embeddings are initialised using tf.Variable which are by default trainable. They will be updated. The problem might be with the way you are calculating loss. Look at these following lines

y1 = tf.nn.softmax(tf.matmul(y0, W1) + b1)
y_ = tf.placeholder(tf.float32, [None, out_layer])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y1), reduction_indices=[1])) 

Here you are calculating the softmax function which converts the scores into probabilities

softmax equation

If the denominator here becomes too large or too small then this function can go for a toss. To avoid this numerical instability usually a small epsilon is added like below. This makes sure that there is numerical stability.

softmax_with_epsilon

You can see that even after adding an epsilon the softmax functions value remains the same. If you don't handle this on your own then the gradients may not update properly due to vanishing or exploding gradients.

Avoid the three lines of code and use the tensorflow version tf.nn.sparse_softmax_cross_entropy_with_logits

Note that this function will calculate the softmax function internally. It is advisable to use this instead of calculating the loss manually. You can use this as follows

y1 = tf.matmul(y0, W1) + b1
y_ = tf.placeholder(tf.float32, [None, out_layer])
cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y1, labels=y_))
like image 135
Kashyap Avatar answered Nov 25 '25 09:11

Kashyap