Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Load saved checkpoint and predict not producing same results as in training

I'm training based on a sample code I found on the Internet. The accuracy in testing is at 92% and the checkpoints are saved in a directory. In parallel (the training is running for 3 days now) I want to create my prediction code so I can learn more instead of just waiting.

This is my third day of deep learning so I probably don't know what I'm doing. Here's how I'm trying to predict:

  • Instantiate the model using the same code as in training
  • Load the last checkpoint
  • Try to predict

The code works but the results are nowhere near 90%.

Here's how I create the model:

INPUT_LAYERS = 2
OUTPUT_LAYERS = 2
AMOUNT_OF_DROPOUT = 0.3
HIDDEN_SIZE = 700
INITIALIZATION = "he_normal"  # : Gaussian initialization scaled by fan_in (He et al., 2014)
CHARS = list("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ .")

def generate_model(output_len, chars=None):
    """Generate the model"""
    print('Build model...')
    chars = chars or CHARS
    model = Sequential()
    # "Encode" the input sequence using an RNN, producing an output of HIDDEN_SIZE
    # note: in a situation where your input sequences have a variable length,
    # use input_shape=(None, nb_feature).
    for layer_number in range(INPUT_LAYERS):
        model.add(recurrent.LSTM(HIDDEN_SIZE, input_shape=(None, len(chars)), init=INITIALIZATION,
                         return_sequences=layer_number + 1 < INPUT_LAYERS))
        model.add(Dropout(AMOUNT_OF_DROPOUT))
    # For the decoder's input, we repeat the encoded input for each time step
    model.add(RepeatVector(output_len))
    # The decoder RNN could be multiple layers stacked or a single layer
    for _ in range(OUTPUT_LAYERS):
        model.add(recurrent.LSTM(HIDDEN_SIZE, return_sequences=True, init=INITIALIZATION))
        model.add(Dropout(AMOUNT_OF_DROPOUT))

    # For each of step of the output sequence, decide which character should be chosen
    model.add(TimeDistributed(Dense(len(chars), init=INITIALIZATION)))
    model.add(Activation('softmax'))

    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

In a separate file predict.py I import this method to create my model and try to predict:

...import code
model = generate_model(len(question), dataset['chars'])
model.load_weights('models/weights.204-0.20.hdf5')

def decode(pred):
    return character_table.decode(pred, calc_argmax=False)


x = np.zeros((1, len(question), len(dataset['chars'])))
for t, char in enumerate(question):
    x[0, t, character_table.char_indices[char]] = 1.

preds = model.predict_classes([x], verbose=0)[0]

print("======================================")
print(decode(preds))

I don't know what the problem is. I have about 90 checkpoints in my directory and I'm loading the last one based on accuracy. All of them saved by a ModelCheckpoint:

checkpoint = ModelCheckpoint(MODEL_CHECKPOINT_DIRECTORYNAME + '/' + MODEL_CHECKPOINT_FILENAME,
                         save_best_only=True)

I'm stuck. What am I doing wrong?

like image 560
Romeo Mihalcea Avatar asked Sep 06 '17 12:09

Romeo Mihalcea


People also ask

How do you save models with ModelCheckpoint?

Callback to save the Keras model or model weights at some frequency. ModelCheckpoint callback is used in conjunction with training using model. fit() to save a model or weights (in a checkpoint file) at some interval, so the model or weights can be loaded later to continue the training from the state saved.

How do you continue training in Tensorflow?

To continue training a loaded model with checkpoints, we simply rerun the model. fit function with the callback still parsed. This however overwrites the currently saved best model, so make sure to change the checkpoint file path if this is undesired.


1 Answers

In the repo you provided, the training and validation sentences are inverted before being fed into the model (as commonly done in seq2seq learning).

dataset = DataSet(DATASET_FILENAME)

As you can see, the default value for inverted is True, and the questions are inverted.

class DataSet(object):
    def __init__(self, dataset_filename, test_set_fraction=0.1, inverted=True):
        self.inverted = inverted

    ...

        question = question[::-1] if self.inverted else question
        questions.append(question)

You can try to invert the sentences during prediction. Specifically,

x = np.zeros((1, len(question), len(dataset['chars'])))
for t, char in enumerate(question):
    x[0, len(question) - t - 1, character_table.char_indices[char]] = 1.
like image 99
Yu-Yang Avatar answered Oct 02 '22 23:10

Yu-Yang