Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

tensorflow:Can save best model only with val_acc available, skipping

I have an issue with tf.callbacks.ModelChekpoint. As you can see in my log file, the warning comes always before the last iteration where the val_acc is calculated. Therefore, Modelcheckpoint never finds the val_acc

Epoch 1/30
1/8 [==>...........................] - ETA: 19s - loss: 1.4174 - accuracy: 0.3000
2/8 [======>.......................] - ETA: 8s - loss: 1.3363 - accuracy: 0.3500 
3/8 [==========>...................] - ETA: 4s - loss: 1.3994 - accuracy: 0.2667
4/8 [==============>...............] - ETA: 3s - loss: 1.3527 - accuracy: 0.3250
6/8 [=====================>........] - ETA: 1s - loss: 1.3042 - accuracy: 0.3333
WARNING:tensorflow:Can save best model only with val_acc available, skipping.
8/8 [==============================] - 4s 482ms/step - loss: 1.2846 - accuracy: 0.3375 - val_loss: 1.3512 - val_accuracy: 0.5000

Epoch 2/30
1/8 [==>...........................] - ETA: 0s - loss: 1.0098 - accuracy: 0.5000
3/8 [==========>...................] - ETA: 0s - loss: 0.8916 - accuracy: 0.5333
5/8 [=================>............] - ETA: 0s - loss: 0.9533 - accuracy: 0.5600
6/8 [=====================>........] - ETA: 0s - loss: 0.9523 - accuracy: 0.5667
7/8 [=========================>....] - ETA: 0s - loss: 0.9377 - accuracy: 0.5714
WARNING:tensorflow:Can save best model only with val_acc available, skipping.
8/8 [==============================] - 1s 98ms/step - loss: 0.9229 - accuracy: 0.5750 - val_loss: 1.2507 - val_accuracy: 0.5000

This is my code for training the CNN.

callbacks = [
        TensorBoard(log_dir=r'C:\Users\reda\Desktop\logs\{}'.format(Name),
                    histogram_freq=1),
        ModelCheckpoint(filepath=r"C:\Users\reda\Desktop\checkpoints\{}".format(Name), monitor='val_acc',
                        verbose=2, save_best_only=True, mode='max')]
history = model.fit_generator(
        train_data_gen, 
        steps_per_epoch=total_train // batch_size,
        epochs=epochs,
        validation_data=val_data_gen,
        validation_steps=total_val // batch_size,
        callbacks=callbacks)
like image 291
Reda El Hail Avatar asked Apr 29 '20 15:04

Reda El Hail


3 Answers

I know how frustrating these things can be sometimes..but tensorflow requires that you explicitly write out the name of metric you are wanting to calculate

You will need to actually say 'val_accuracy'

metric = 'val_accuracy'
ModelCheckpoint(filepath=r"C:\Users\reda.elhail\Desktop\checkpoints\{}".format(Name), monitor=metric,
                    verbose=2, save_best_only=True, mode='max')]

Hope this helps =)

like image 192
Brian Mark Anderson Avatar answered Nov 14 '22 05:11

Brian Mark Anderson


To add to the accepted answer as I just struggled with this. Not only do you have to use the full the metric name, it must match for your model.compile, ModelCheckpoint, and EarlyStopping. I had one set to accuracy and the other two set to val_accuracy and it did not work.

like image 34
BlueTurtle Avatar answered Nov 14 '22 05:11

BlueTurtle


Print the metrics after training for one epoch like below. This will print the metrics defined for your model.

hist = model.fit(...)
for key in hist.history:
print(key)

Now replace them in your metrics. It will work like charm.

This hack was given by the gentleman in the below link. Thanks to him!! https://github.com/tensorflow/tensorflow/issues/33163#issuecomment-540451749

like image 2
Sachin Mohan Avatar answered Nov 14 '22 06:11

Sachin Mohan