I am trying to implement a window based classifier with tensorflow,
The word embedding matrix is called word_vec and is initialized randomly (I tried Xavier also).
And the ind variable is the a vector of the indices of the word vectors from the matrix.
The first layer is config['window_size'] (5) word vectors concatenated.
word_vecs = tf.Variable(tf.random_uniform([len(words), config['embed_size']], -1.0, 1.0),dtype=tf.float32)
ind = tf.placeholder(tf.int32, [None, config['window_size']])
x = tf.concat(1,tf.unpack(tf.nn.embedding_lookup(word_vecs, ind),axis=1))
W0 = tf.Variable(tf.random_uniform([config['window_size']*config['embed_size'], config['hidden_layer']]))
b0 = tf.Variable(tf.zeros([config['hidden_layer']]))
W1 = tf.Variable(tf.random_uniform([config['hidden_layer'], out_layer]))
b1 = tf.Variable(tf.zeros([out_layer]))
y0 = tf.nn.tanh(tf.matmul(x, W0) + b0)
y1 = tf.nn.softmax(tf.matmul(y0, W1) + b1)
y_ = tf.placeholder(tf.float32, [None, out_layer])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y1), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(0.5).minimize(cross_entropy)
And this is how I run the graph:
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(config['iterations'] ):
r = random.randint(0,len(sentences)-1)
inds=generate_windows([w for w,t in sentences[r]])
#inds now contains an array of n rows on window_size columns
ys=[one_hot(tags.index(t),len(tags)) for w,t in sentences[r]]
#ys now contains an array of n rows on output_size columns
sess.run(train_step, feed_dict={ind: inds, y_: ys})
The dimensions work out, and the code runs
However, the accuracy is near zero, and I suspect that the the word vectors aren't being updated properly.
How can I make tensorflow update the word vectors back from the concatenated window form ?
Your embeddings are initialised using tf.Variable which are by default trainable. They will be updated. The problem might be with the way you are calculating loss. Look at these following lines
y1 = tf.nn.softmax(tf.matmul(y0, W1) + b1)
y_ = tf.placeholder(tf.float32, [None, out_layer])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y1), reduction_indices=[1]))
Here you are calculating the softmax function which converts the scores into probabilities
If the denominator here becomes too large or too small then this function can go for a toss. To avoid this numerical instability usually a small epsilon is added like below. This makes sure that there is numerical stability.
You can see that even after adding an epsilon the softmax functions value remains the same. If you don't handle this on your own then the gradients may not update properly due to vanishing or exploding gradients.
Avoid the three lines of code and use the tensorflow version
tf.nn.sparse_softmax_cross_entropy_with_logits
Note that this function will calculate the softmax function internally. It is advisable to use this instead of calculating the loss manually. You can use this as follows
y1 = tf.matmul(y0, W1) + b1
y_ = tf.placeholder(tf.float32, [None, out_layer])
cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y1, labels=y_))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With