Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to build an embedding layer in Tensorflow RNN?

I'm building an RNN LSTM network to classify texts based on the writers' age (binary classification - young / adult).

Seems like the network does not learn and suddenly starts overfitting:

rnn_overfitting
Red: train
Blue: validation

One possibility could be that the data representation is not good enough. I just sorted the unique words by their frequency and gave them indices. E.g.:

unknown -> 0
the     -> 1
a       -> 2
.       -> 3
to      -> 4

So I'm trying to replace that with word embedding. I saw a couple of examples but I'm not able to implement it in my code. Most of the examples look like this:

embedding = tf.Variable(tf.random_uniform([vocab_size, hidden_size], -1, 1))
inputs = tf.nn.embedding_lookup(embedding, input_data)

Does this mean we're building a layer that learns the embedding? I thought that one should download some Word2Vec or Glove and just use that.

Anyway let's say I want to build this embedding layer...
If I use these 2 lines in my code I get an error:

TypeError: Value passed to parameter 'indices' has DataType float32 not in list of allowed values: int32, int64

So I guess I have to change the input_data type to int32. So I do that (it's all indices after all), and I get this:

TypeError: inputs must be a sequence

I tried wrapping inputs (argument to tf.contrib.rnn.static_rnn) with a list: [inputs] as suggested in this answer, but that produced another error:

ValueError: Input size (dimension 0 of inputs) must be accessible via shape inference, but saw value None.


Update:

I was unstacking the tensor x before passing it to embedding_lookup. I moved the unstacking after the embedding.

Updated code:

MIN_TOKENS = 10
MAX_TOKENS = 30
x = tf.placeholder("int32", [None, MAX_TOKENS, 1])
y = tf.placeholder("float", [None, N_CLASSES]) # 0.0 / 1.0
...
seqlen = tf.placeholder(tf.int32, [None]) #list of each sequence length*
embedding = tf.Variable(tf.random_uniform([VOCAB_SIZE, HIDDEN_SIZE], -1, 1))
inputs = tf.nn.embedding_lookup(embedding, x) #x is the text after converting to indices
inputs = tf.unstack(inputs, MAX_POST_LENGTH, 1)
outputs, states = tf.contrib.rnn.static_rnn(lstm_cell, inputs, dtype=tf.float32, sequence_length=seqlen) #---> Produces error

*seqlen: I zero-padded the sequences so all of them have the same list size, but since the actual size differ, I prepared a list describing the length without the padding.

New error:

ValueError: Input 0 of layer basic_lstm_cell_1 is incompatible with the layer: expected ndim=2, found ndim=3. Full shape received: [None, 1, 64]

64 is the size of each hidden layer.

It's obvious that I have a problem with the dimensions... How can I make the inputs fit the network after embedding?

like image 467
Alaa M. Avatar asked Sep 04 '18 14:09

Alaa M.


2 Answers

From the tf.nn.static_rnn , we can see the inputs arguments to be:

A length T list of inputs, each a Tensor of shape [batch_size, input_size]

So your code should be something like:

x = tf.placeholder("int32", [None, MAX_TOKENS])
...
inputs = tf.unstack(inputs, axis=1)
like image 65
vijay m Avatar answered Oct 28 '22 14:10

vijay m


tf.squeeze is a method that removes dimensions of size 1 from the tensor. If the end goal is to have the input shape as [None,64], then put a line similar to inputs = tf.squeeze(inputs) and that would fix your problem.

like image 1
Neeyanth Kopparapu Avatar answered Oct 28 '22 14:10

Neeyanth Kopparapu