0% accuracy with evaluate_generator but 75% accuracy during training with same data - what is going on?

Question

I'm encountering a very strange with a keras model using ImageDataGenerator, fit_generator, and evaluate_generator.

I'm creating the model like so:

classes = <list of classes>
num_classes = len(classes)

pretrained_model = Sequential()
pretrained_model.add(ResNet50(include_top=False, weights='imagenet', pooling='avg'))
pretrained_model.add(Dense(num_classes, activation='softmax'))

pretrained_model.layers[0].trainable = False

pretrained_model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

And I'm training it like this:

idg_final = ImageDataGenerator(
    data_format='channels_last',
    rescale=1./255,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    rotation_range=15,
)

traing_gen = idg_final.flow_from_directory('./train', classes=classes, target_size=(224, 224), class_mode='categorical')

pretrained_model.fit_generator(traing_gen, epochs=1, verbose=1)

fit_generator prints loss: 1.0297 - acc: 0.7546.

Then, I am trying to evaluate the model on the exact same data it was trained on.

debug_gen = idg_final.flow_from_directory('./train', target_size=(224, 224), class_mode='categorical', classes=classes, shuffle=True)
print(pretrained_model.evaluate_generator(debug_gen, steps=100))

Which prints [10.278913383483888, 0.0].

Why is the accuracy so different on the same exact data?

Edit: I also wanted to point out that sometimes the accuracy is above 0.0. For example, when I use a model trained with five epochs, evaluate_accuracy returns 6% accuracy.

Edit 2: Based on the answers below I made sure to train for more epochs and that the ImageDataGenerator for evaluation did not have random shifts and rotations. I'm still getting very high accuracy during training and extremely low accuracy during evaluation on the same dataset.

I'm training like

idg_final = ImageDataGenerator(
    data_format='channels_last',
    rescale=1./255,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    rotation_range=15,
)

traing_gen = idg_final.flow_from_directory('./train', classes=classes, target_size=(224, 224), class_mode='categorical')                  

pretrained_model.fit_generator(traing_gen, epochs=10, verbose=1)

Which prints the following:

Found 9850 images belonging to 4251 classes.
Epoch 1/10
308/308 [==============================] - 3985s 13s/step - loss: 8.9218 - acc: 0.0860
Epoch 2/10
308/308 [==============================] - 3555s 12s/step - loss: 3.2710 - acc: 0.3403
Epoch 3/10
308/308 [==============================] - 3594s 12s/step - loss: 1.8597 - acc: 0.5836
Epoch 4/10
308/308 [==============================] - 3656s 12s/step - loss: 1.2712 - acc: 0.7058
Epoch 5/10
308/308 [==============================] - 3667s 12s/step - loss: 0.9556 - acc: 0.7795
Epoch 6/10
308/308 [==============================] - 3689s 12s/step - loss: 0.7665 - acc: 0.8207
Epoch 7/10
308/308 [==============================] - 3693s 12s/step - loss: 0.6581 - acc: 0.8498
Epoch 8/10
308/308 [==============================] - 3618s 12s/step - loss: 0.5874 - acc: 0.8636
Epoch 9/10
308/308 [==============================] - 3823s 12s/step - loss: 0.5144 - acc: 0.8797
Epoch 10/10
308/308 [==============================] - 4334s 14s/step - loss: 0.4835 - acc: 0.8854

And I'm evaluating like this on the exact same dataset

idg_debug = ImageDataGenerator(
    data_format='channels_last',
    rescale=1./255,
)

debug_gen = idg_debug.flow_from_directory('./train', target_size=(224, 224), class_mode='categorical', classes=classes)
print(pretrained_model.evaluate_generator(debug_gen))

Which prints the following very low accuracy: [10.743386410747084, 0.0001015228426395939]

The full code is here.

Daniel Möller · Accepted Answer

Two things I suspect.

1 - No, your data is not the same.

You're using three types of augmentation in ImageDataGenerator, and it seems there isn't a random seed being set. So, test data is not equal to training data.

And as it seems, you're also training for only one epoch, which is very little (unless you really have tons of data, but since you're using augmentation, maybe that's not the case). (PS: I don't see the steps_per_epoch argument in your fit_generator call...)

So, if you want to see good results, here are some solutions:

remove the augmentation arguments from the generator for this test (either training and test data) - This means, remove width_shift_range, height_shift_range and rotation_range;
if not, train for really long, enough for your model to really get used to all kinds of augmented images (as it seems, five epochs seem still to be way too little);
or set a random seed and guarantee that the test data is equal to the training data (argument seed in flow_from_directory)

2 - (This may happen if you're very new to Keras/programming, so please ignore if it's not the case) You might be running the code that defines the model again when testing.

If you run the code that defines the model again, it will replace all your previous training with random weights.

3 - Since we're out of suggestions:

Maybe save the weights instead of saving the model. I usually do this instead of saving the models. (For some reason I don't understand, I've never been able to load a model like that)

def createModel():
    ....

model = createModel()
...
model.fit_generator(....)

np.save('model_weights.npy',model.get_weights())

model = createModel()
model.set_weights(np.load('model_weights.npy'))
...
model.evaluate_generator(...)

Hint:

It's not related to the bug, but make sure that the base model layer is really layer 0. If I remember well, sequential models have an input layer and you should actually be making layer 1 untrainable instead.

Use the model.summary() to confirm the number of untrainable parameters.

0% accuracy with evaluate_generator but 75% accuracy during training with same data - what is going on?

Tags:

python

tensorflow

keras

Ben Sandler

1 Answers

Daniel Möller

Recent Activity

Donate For Us

0% accuracy with evaluate_generator but 75% accuracy during training with same data - what is going on?

Tags:

python

tensorflow

keras

Ben Sandler

1 Answers

Daniel Möller

Related questions

Recent Activity

Donate For Us