Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does my keras LSTM model get stuck in an infinite loop?

I am trying to build a small LSTM that can learn to write code (even if it's garbage code) by training it on existing Python code. I have concatenated a few thousand lines of code together in one file across several hundred files, with each file ending in <eos> to signify "end of sequence".

As an example, my training file looks like:


setup(name='Keras',
...
      ],
      packages=find_packages())
<eos>
import pyux
...
with open('api.json', 'w') as f:
    json.dump(sign, f)
<eos>

I am creating tokens from the words with:

file = open(self.textfile, 'r')
filecontents = file.read()
file.close()
filecontents = filecontents.replace("\n\n", "\n")
filecontents = filecontents.replace('\n', ' \n ')
filecontents = filecontents.replace('    ', ' \t ')

text_in_words = [w for w in filecontents.split(' ') if w != '']

self._words = set(text_in_words)
    STEP = 1
    self._codelines = []
    self._next_words = []
    for i in range(0, len(text_in_words) - self.seq_length, STEP):
        self._codelines.append(text_in_words[i: i + self.seq_length])
        self._next_words.append(text_in_words[i + self.seq_length])

My keras model is:

model = Sequential()
model.add(Embedding(input_dim=len(self._words), output_dim=1024))

model.add(Bidirectional(
    LSTM(128), input_shape=(self.seq_length, len(self._words))))

model.add(Dropout(rate=0.5))
model.add(Dense(len(self._words)))
model.add(Activation('softmax'))

model.compile(loss='sparse_categorical_crossentropy',
              optimizer="adam", metrics=['accuracy'])

But no matter how much I train it, the model never seems to generate <eos> or even \n. I think it might be because my LSTM size is 128 and my seq_length is 200, but that doesn't quite make sense? Is there something I'm missing?

like image 260
Shamoon Avatar asked May 19 '19 19:05

Shamoon


People also ask

How do I increase the accuracy of my LSTM model?

Dense layers improve overall accuracy and 5–10 units or nodes per layer is a good base. So the output shape of the final dense layer will be affected by the number of neuron / units specified. Every LSTM layer should be accompanied by a dropout layer.

How many layers should an LSTM model have?

The vanilla LSTM network has three layers; an input layer, a single hidden layer followed by a standard feedforward output layer.

What is keras LSTM?

Keras LSTM stands for the Long short-term memory layer, which Hochreiter created in 1997. This layer uses available constraints and runtime hardware to gain the most optimized performance where we can choose the various implementation that is pure tensorflow or cuDNN based.


1 Answers

Sometimes, when there is no limit for code generation or the <EOS> or <SOS> tokens are not numerical tokens LSTM never converges. If you could send your outputs or error messages, it would be much easier to debug.

You could create an extra class for getting words and sentences.

# tokens for start of sentence(SOS) and end of sentence(EOS)

SOS_token = 0
EOS_token = 1


class Lang:
    '''
    class for word object, storing sentences, words and word counts.
    '''
    def __init__(self, name):
        self.name = name
        self.word2index = {}
        self.word2count = {}
        self.index2word = {0: "SOS", 1: "EOS"}
        self.n_words = 2  # Count SOS and EOS

    def addSentence(self, sentence):
        for word in sentence.split(' '):
            self.addWord(word)

    def addWord(self, word):
        if word not in self.word2index:
            self.word2index[word] = self.n_words
            self.word2count[word] = 1
            self.index2word[self.n_words] = word
            self.n_words += 1
        else:
            self.word2count[word] += 1

Then, while generating text, just adding a <SOS> token would do. You can use https://github.com/sherjilozair/char-rnn-tensorflow , a character level rnn for reference.

like image 68
ASHu2 Avatar answered Oct 04 '22 04:10

ASHu2