Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use LSTM tutorial code to predict next word in a sentence?

Tags:

I've been trying to understand the sample code with https://www.tensorflow.org/tutorials/recurrent which you can find at https://github.com/tensorflow/models/blob/master/tutorials/rnn/ptb/ptb_word_lm.py

(Using tensorflow 1.3.0.)

I've summarized (what I think are) the key parts, for my question, below:

 size = 200  vocab_size = 10000  layers = 2  # input_.input_data is a 2D tensor [batch_size, num_steps] of  #    word ids, from 1 to 10000   cell = tf.contrib.rnn.MultiRNNCell(     [tf.contrib.rnn.BasicLSTMCell(size) for _ in range(2)]     )   embedding = tf.get_variable(       "embedding", [vocab_size, size], dtype=tf.float32)  inputs = tf.nn.embedding_lookup(embedding, input_.input_data)  inputs = tf.unstack(inputs, num=num_steps, axis=1) outputs, state = tf.contrib.rnn.static_rnn(     cell, inputs, initial_state=self._initial_state)  output = tf.reshape(tf.stack(axis=1, values=outputs), [-1, size]) softmax_w = tf.get_variable(     "softmax_w", [size, vocab_size], dtype=data_type()) softmax_b = tf.get_variable("softmax_b", [vocab_size], dtype=data_type()) logits = tf.matmul(output, softmax_w) + softmax_b  # Then calculate loss, do gradient descent, etc. 

My biggest question is how do I use the produced model to actually generate a next word suggestion, given the first few words of a sentence? Concretely, I imagine the flow is like this, but I cannot get my head around what the code for the commented lines would be:

prefix = ["What", "is", "your"] state = #Zeroes # Call static_rnn(cell) once for each word in prefix to initialize state # Use final output to set a string, next_word print(next_word) 

My sub-questions are:

  • Why use a random (uninitialized, untrained) word-embedding?
  • Why use softmax?
  • Does the hidden layer have to match the dimension of the input (i.e. the dimension of the word2vec embeddings)
  • How/Can I bring in a pre-trained word2vec model, instead of that uninitialized one?

(I'm asking them all as one question, as I suspect they are all connected, and connected to some gap in my understanding.)

What I was expecting to see here was loading an existing word2vec set of word embeddings (e.g. using gensim's KeyedVectors.load_word2vec_format()), convert each word in the input corpus to that representation when loading in each sentence, and then afterwards the LSTM would spit out a vector of the same dimension, and we would try and find the most similar word (e.g. using gensim's similar_by_vector(y, topn=1)).

Is using softmax saving us from the relatively slow similar_by_vector(y, topn=1) call?


BTW, for the pre-existing word2vec part of my question Using pre-trained word2vec with LSTM for word generation is similar. However the answers there, currently, are not what I'm looking for. What I'm hoping for is a plain English explanation that switches the light on for me, and plugs whatever the gap in my understanding is.  Use pre-trained word2vec in lstm language model? is another similar question.

UPDATE: Predicting next word using the language model tensorflow example and Predicting the next word using the LSTM ptb model tensorflow example are similar questions. However, neither shows the code to actually take the first few words of a sentence, and print out its prediction of the next word. I tried pasting in code from the 2nd question, and from https://stackoverflow.com/a/39282697/841830 (which comes with a github branch), but cannot get either to run without errors. I think they may be for an earlier version of TensorFlow?

ANOTHER UPDATE: Yet another question asking basically the same thing: Predicting Next Word of LSTM Model from Tensorflow Example It links to Predicting next word using the language model tensorflow example (and, again, the answers there are not quite what I am looking for).

In case it still isn't clear, what I am trying to write a high-level function called getNextWord(model, sentencePrefix), where model is a previously built LSTM that I've loaded from disk, and sentencePrefix is a string, such as "Open the", and it might return "pod". I then might call it with "Open the pod" and it will return "bay", and so on.

An example (with a character RNN, and using mxnet) is the sample() function shown near the end of https://github.com/zackchase/mxnet-the-straight-dope/blob/master/chapter05_recurrent-neural-networks/simple-rnn.ipynb You can call sample() during training, but you can also call it after training, and with any sentence you want.

like image 335
Darren Cook Avatar asked Sep 08 '17 21:09

Darren Cook


People also ask

How does LSTM predict next word?

Predicting the next word is a neural application that uses Recurrent neural networks. Since basic recurrent neural networks have a lot of flows we go for LSTM. Here we can make sure of having longer memory of what words are important with help of those three gates we saw earlier.

Can LSTM be used for prediction?

Unlike any feedforward neural network, LSTM has feedback connections. Therefore, it can predict values for point data and can predict sequential data like weather, stock market data, or work with audio or video data, which is considered sequential data.

How does LSTM predict future values?

In order to do that, you need to define the outputs as y[t: t + H] (instead of y[t] as in the current code) where y is the time series and H is the length of the forecast period (i.e. the number of days ahead that you want to forecast).


1 Answers

Main Question

Loading words

Load custom data instead of using the test set:

reader.py@ptb_raw_data  test_path = os.path.join(data_path, "ptb.test.txt") test_data = _file_to_word_ids(test_path, word_to_id)  # change this line 

test_data should contain word ids (print out word_to_id for a mapping). As an example, it should look like: [1, 52, 562, 246] ...

Displaying predictions

We need to return the output of the FC layer (logits) in the call to sess.run

ptb_word_lm.py@PTBModel.__init__      logits = tf.reshape(logits, [self.batch_size, self.num_steps, vocab_size])     self.top_word_id = tf.argmax(logits, axis=2)  # add this line  ptb_word_lm.py@run_epoch    fetches = {       "cost": model.cost,       "final_state": model.final_state,       "top_word_id": model.top_word_id # add this line   } 

Later in the function, vals['top_word_id'] will have an array of integers with the ID of the top word. Look this up in word_to_id to determine the predicted word. I did this a while ago with the small model, and the top 1 accuracy was pretty low (20-30% iirc), even though the perplexity was what was predicted in the header.

Subquestions

Why use a random (uninitialized, untrained) word-embedding?

You'd have to ask the authors, but in my opinion, training the embeddings makes this more of a standalone tutorial: instead of treating embedding as a black box, it shows how it works.

Why use softmax?

The final prediction is not determined by the cosine similarity to the output of the hidden layer. There is an FC layer after the LSTM that converts the embedded state to a one-hot encoding of the final word.

Here's a sketch of the operations and dimensions in the neural net:

word -> one hot code (1 x vocab_size) -> embedding (1 x hidden_size) -> LSTM -> FC layer (1 x vocab_size) -> softmax (1 x vocab_size) 

Does the hidden layer have to match the dimension of the input (i.e. the dimension of the word2vec embeddings)

Technically, no. If you look at the LSTM equations, you'll notice that x (the input) can be any size, as long as the weight matrix is adjusted appropriately.

LSTM equations

How/Can I bring in a pre-trained word2vec model, instead of that uninitialized one?

I don't know, sorry.

like image 73
c2huc2hu Avatar answered Oct 05 '22 13:10

c2huc2hu