Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

keras CNN : train and validation set are identical but with different accuracy

I know this a very bad thing to do but I noticed something strange using keras mobilenet :

I use the same data for training and validation set :

train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=(IM_WIDTH, IM_HEIGHT),
    batch_size=batch_size,
    class_mode = "categorical"
)
validation_generator = train_datagen.flow_from_directory(
  train_dir,
  target_size=(IM_WIDTH, IM_HEIGHT),
  class_mode = "categorical"
)

but I don't get the same accuracy on both !

epoch 30/30 - loss: 0.3485 - acc: 0.8938 - val_loss: 1.7545 - val_acc: 0.4406

It seems that I am overfitting the training set compared to the validation set.. but they are supposed to be the same ! How is that possible ?

like image 307
Arcyno Avatar asked Aug 29 '18 08:08

Arcyno


People also ask

Why my validation accuracy is higher than training accuracy?

The training loss is higher because you've made it artificially harder for the network to give the right answers. However, during validation all of the units are available, so the network has its full computational power - and thus it might perform better than in training.

Why validation accuracy is less than training?

If your model's accuracy on your testing data is lower than your training or validation accuracy, it usually indicates that there are meaningful differences between the kind of data you trained the model on and the testing data you're providing for evaluation.

Does more training data always results in higher validation accuracy?

The validation and test accuracies are only slightly greater than the training accuracy. This can happen (e.g. due to the fact that the validation or test examples come from a distribution where the model performs actually better), although that usually doesn't happen.

Does keras train on validation data?

Keras can separate a portion of your training data into a validation dataset and evaluate the performance of your model on that validation dataset in each epoch.


1 Answers

The training loss is calculated on the fly and only the validation loss is calculated after the epoch is trained. So at the beginning a nearly untrained net will make the training loss worse that it actually is. This effect should vanish in later epochs, since then one epochs mpact on the scoring is not that big anymore.

This behaviour is adressed in keras faq. If you evaluate both at the end of epoch with a self written callback, they should be the same.

like image 64
dennis-w Avatar answered Nov 01 '22 08:11

dennis-w