I would like to use tensorflow to generate text and have been modifying the LSTM tutorial (https://www.tensorflow.org/versions/master/tutorials/recurrent/index.html#recurrent-neural-networks) code to do this, however my initial solution seems to generate nonsense, even after training for a long time, it does not improve. I fail to see why. The idea is to start with a zero matrix and then generate one word at a time.
This is the code, to which I've added the two functions below https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/rnn/ptb/ptb_word_lm.py
The generator looks as follows
def generate_text(session,m,eval_op):
state = m.initial_state.eval()
x = np.zeros((m.batch_size,m.num_steps), dtype=np.int32)
output = str()
for i in xrange(m.batch_size):
for step in xrange(m.num_steps):
try:
# Run the batch
# targets have to bee set but m is the validation model, thus it should not train the neural network
cost, state, _, probabilities = session.run([m.cost, m.final_state, eval_op, m.probabilities],
{m.input_data: x, m.targets: x, m.initial_state: state})
# Sample a word-id and add it to the matrix and output
word_id = sample(probabilities[0,:])
output = output + " " + reader.word_from_id(word_id)
x[i][step] = word_id
except ValueError as e:
print("ValueError")
print(output)
I have added the variable "probabilities" to the ptb_model and it is simply a softmax over the logits.
self._probabilities = tf.nn.softmax(logits)
And the sampling:
def sample(a, temperature=1.0):
# helper function to sample an index from a probability array
a = np.log(a) / temperature
a = np.exp(a) / np.sum(np.exp(a))
return np.argmax(np.random.multinomial(1, a, 1))
An end-to-end guide on Text generation using LSTMIn the next generation, we predict the next character of a given word of a sequence. Text data can be seen as a sequence of words or a sequence of individual data. For the prediction of sequence, we have used deep learning models like RNN/LSTM/GRU.
Having a good hold over memorizing certain patterns LSTMs perform fairly better. As with every other NN, LSTM can have multiple hidden layers and as it passes through every layer, the relevant information is kept and all the irrelevant information gets discarded in every single cell.
Text classification using LSTM In the modelling, we are making a sequential model. The first layer of the model is the embedding layer which uses the 32 length vector, and the next layer is the LSTM layer which has 100 neurons which will work as the memory unit of the model.
Generate textEach time you call the model you pass in some text and an internal state. The model returns a prediction for the next character and its new state. Pass the prediction and state back in to continue generating text.
I have been working toward the exact same goal, and just got it to work. You have many of the right modifications here, but I think you've missed a few steps.
First, for generating text you need to create a different version of the model which represents only a single timestep. The reason is that we need to sample each output y before we can feed it into the next step of the model. I did this by making a new config which sets num_steps
and batch_size
both equal to 1.
class SmallGenConfig(object):
"""Small config. for generation"""
init_scale = 0.1
learning_rate = 1.0
max_grad_norm = 5
num_layers = 2
num_steps = 1 # this is the main difference
hidden_size = 200
max_epoch = 4
max_max_epoch = 13
keep_prob = 1.0
lr_decay = 0.5
batch_size = 1
vocab_size = 10000
I also added a probabilities to the model with these lines:
self._output_probs = tf.nn.softmax(logits)
and
@property
def output_probs(self):
return self._output_probs
Then, there are a few differences in my generate_text()
function. The first one is that I load saved model parameters from disk using the tf.train.Saver()
object. Note that we do this after instantiating the PTBModel with the new config from above.
def generate_text(train_path, model_path, num_sentences):
gen_config = SmallGenConfig()
with tf.Graph().as_default(), tf.Session() as session:
initializer = tf.random_uniform_initializer(-gen_config.init_scale,
gen_config.init_scale)
with tf.variable_scope("model", reuse=None, initializer=initializer):
m = PTBModel(is_training=False, config=gen_config)
# Restore variables from disk.
saver = tf.train.Saver()
saver.restore(session, model_path)
print("Model restored from file " + model_path)
The second difference is that I get the lookup table from ids to word strings (I had to write this function, see the code below).
words = reader.get_vocab(train_path)
I set up the initial state the same way you do, but then I set up the initial token in a different manner. I want to use the "end of sentence" token so that I'll start my sentence with the right types of words. I looked through the word index and found that <eos>
happens to have index 2 (deterministic) so I just hard-coded that in. Finally, I wrap it in a 1x1 Numpy Matrix so that it is the right type for the model inputs.
state = m.initial_state.eval()
x = 2 # the id for '<eos>' from the training set
input = np.matrix([[x]]) # a 2D numpy matrix
Finally, here's the part where we generate sentences. Note that we tell session.run()
to compute the output_probs
and the final_state
. And we give it the input and the state. In the first iteration the input is <eos>
and the state is the initial_state
, but on subsequent iterations we give as input our last sampled output, and we pass the state along from the last iteration. Note also that we use the words
list to look up the word string from the output index.
text = ""
count = 0
while count < num_sentences:
output_probs, state = session.run([m.output_probs, m.final_state],
{m.input_data: input,
m.initial_state: state})
x = sample(output_probs[0], 0.9)
if words[x]=="<eos>":
text += ".\n\n"
count += 1
else:
text += " " + words[x]
# now feed this new word as input into the next iteration
input = np.matrix([[x]])
Then all we have to do is print out the text we accumulated.
print(text)
return
That's it for the generate_text()
function.
Finally, let me show you the function definition for get_vocab()
, which I put in reader.py.
def get_vocab(filename):
data = _read_words(filename)
counter = collections.Counter(data)
count_pairs = sorted(counter.items(), key=lambda x: (-x[1], x[0]))
words, _ = list(zip(*count_pairs))
return words
The last thing you need to do is to be able to save the model after training it, which looks like
save_path = saver.save(session, "/tmp/model.ckpt")
And that's the model that you'll load from disk later when generating text.
There was one more problem: I found that sometimes the probability distribution produced by the Tensorflow softmax function didn't sum exactly to 1.0. When the sum was larger than 1.0, np.random.multinomial()
throws an error. So I had to write my own sampling function, which looks like this
def sample(a, temperature=1.0):
a = np.log(a) / temperature
a = np.exp(a) / np.sum(np.exp(a))
r = random.random() # range: [0,1)
total = 0.0
for i in range(len(a)):
total += a[i]
if total>r:
return i
return len(a)-1
When you put all this together, the small model was able to generate me some cool sentences. Good luck.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With