I'm encountering a very strange with a keras model using ImageDataGenerator, fit_generator, and evaluate_generator.
I'm creating the model like so:
classes = <list of classes>
num_classes = len(classes)
pretrained_model = Sequential()
pretrained_model.add(ResNet50(include_top=False, weights='imagenet', pooling='avg'))
pretrained_model.add(Dense(num_classes, activation='softmax'))
pretrained_model.layers[0].trainable = False
pretrained_model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)
And I'm training it like this:
idg_final = ImageDataGenerator(
data_format='channels_last',
rescale=1./255,
width_shift_range = 0.2,
height_shift_range = 0.2,
rotation_range=15,
)
traing_gen = idg_final.flow_from_directory('./train', classes=classes, target_size=(224, 224), class_mode='categorical')
pretrained_model.fit_generator(traing_gen, epochs=1, verbose=1)
fit_generator
prints loss: 1.0297 - acc: 0.7546
.
Then, I am trying to evaluate the model on the exact same data it was trained on.
debug_gen = idg_final.flow_from_directory('./train', target_size=(224, 224), class_mode='categorical', classes=classes, shuffle=True)
print(pretrained_model.evaluate_generator(debug_gen, steps=100))
Which prints [10.278913383483888, 0.0]
.
Why is the accuracy so different on the same exact data?
Edit: I also wanted to point out that sometimes the accuracy is above 0.0. For example, when I use a model trained with five epochs, evaluate_accuracy
returns 6% accuracy.
Edit 2: Based on the answers below I made sure to train for more epochs and that the ImageDataGenerator
for evaluation did not have random shifts and rotations. I'm still getting very high accuracy during training and extremely low accuracy during evaluation on the same dataset.
I'm training like
idg_final = ImageDataGenerator(
data_format='channels_last',
rescale=1./255,
width_shift_range = 0.2,
height_shift_range = 0.2,
rotation_range=15,
)
traing_gen = idg_final.flow_from_directory('./train', classes=classes, target_size=(224, 224), class_mode='categorical')
pretrained_model.fit_generator(traing_gen, epochs=10, verbose=1)
Which prints the following:
Found 9850 images belonging to 4251 classes.
Epoch 1/10
308/308 [==============================] - 3985s 13s/step - loss: 8.9218 - acc: 0.0860
Epoch 2/10
308/308 [==============================] - 3555s 12s/step - loss: 3.2710 - acc: 0.3403
Epoch 3/10
308/308 [==============================] - 3594s 12s/step - loss: 1.8597 - acc: 0.5836
Epoch 4/10
308/308 [==============================] - 3656s 12s/step - loss: 1.2712 - acc: 0.7058
Epoch 5/10
308/308 [==============================] - 3667s 12s/step - loss: 0.9556 - acc: 0.7795
Epoch 6/10
308/308 [==============================] - 3689s 12s/step - loss: 0.7665 - acc: 0.8207
Epoch 7/10
308/308 [==============================] - 3693s 12s/step - loss: 0.6581 - acc: 0.8498
Epoch 8/10
308/308 [==============================] - 3618s 12s/step - loss: 0.5874 - acc: 0.8636
Epoch 9/10
308/308 [==============================] - 3823s 12s/step - loss: 0.5144 - acc: 0.8797
Epoch 10/10
308/308 [==============================] - 4334s 14s/step - loss: 0.4835 - acc: 0.8854
And I'm evaluating like this on the exact same dataset
idg_debug = ImageDataGenerator(
data_format='channels_last',
rescale=1./255,
)
debug_gen = idg_debug.flow_from_directory('./train', target_size=(224, 224), class_mode='categorical', classes=classes)
print(pretrained_model.evaluate_generator(debug_gen))
Which prints the following very low accuracy: [10.743386410747084, 0.0001015228426395939]
The full code is here.
Two things I suspect.
1 - No, your data is not the same.
You're using three types of augmentation in ImageDataGenerator
, and it seems there isn't a random seed being set. So, test data is not equal to training data.
And as it seems, you're also training for only one epoch, which is very little (unless you really have tons of data, but since you're using augmentation, maybe that's not the case). (PS: I don't see the steps_per_epoch
argument in your fit_generator
call...)
So, if you want to see good results, here are some solutions:
width_shift_range
, height_shift_range
and rotation_range
; seed
in flow_from_directory
) 2 - (This may happen if you're very new to Keras/programming, so please ignore if it's not the case) You might be running the code that defines the model again when testing.
If you run the code that defines the model again, it will replace all your previous training with random weights.
3 - Since we're out of suggestions:
Maybe save the weights instead of saving the model. I usually do this instead of saving the models. (For some reason I don't understand, I've never been able to load a model like that)
def createModel():
....
model = createModel()
...
model.fit_generator(....)
np.save('model_weights.npy',model.get_weights())
model = createModel()
model.set_weights(np.load('model_weights.npy'))
...
model.evaluate_generator(...)
Hint:
It's not related to the bug, but make sure that the base model layer is really layer 0. If I remember well, sequential models have an input layer and you should actually be making layer 1 untrainable instead.
Use the model.summary()
to confirm the number of untrainable parameters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With