Higher validation accuracy, than training accurracy using Tensorflow and Keras

Tags:

I'm trying to use deep learning to predict income from 15 self reported attributes from a dating site.

We're getting rather odd results, where our validation data is getting better accuracy and lower loss, than our training data. And this is consistent across different sizes of hidden layers. This is our model:

for hl1 in [250, 200, 150, 100, 75, 50, 25, 15, 10, 7]:     def baseline_model():         model = Sequential()         model.add(Dense(hl1, input_dim=299, kernel_initializer='normal', activation='relu', kernel_regularizer=regularizers.l1_l2(0.001)))         model.add(Dropout(0.5, seed=seed))         model.add(Dense(3, kernel_initializer='normal', activation='sigmoid'))          model.compile(loss='categorical_crossentropy', optimizer='adamax', metrics=['accuracy'])         return model      history_logs = LossHistory()     model = baseline_model()     history = model.fit(X, Y, validation_split=0.3, shuffle=False, epochs=50, batch_size=10, verbose=2, callbacks=[history_logs])

And this is an example of the accuracy and losses: Accuracy with hidden layer of 250 neurons and the loss .

We've tried to remove regularization and dropout, which, as expected, ended in overfitting (training acc: ~85%). We've even tried to decrease the learning rate drastically, with similiar results.

Has anyone seen similar results?

223

asked May 15 '17 12:05

Jasper

Video Answer

2 Answers

This happens when you use Dropout, since the behaviour when training and testing are different.

When training, a percentage of the features are set to zero (50% in your case since you are using Dropout(0.5)). When testing, all features are used (and are scaled appropriately). So the model at test time is more robust - and can lead to higher testing accuracies.

180

answered Oct 01 '22 12:10

yhenon

You can check the Keras FAQ and especially the section "Why is the training loss much higher than the testing loss?".

I would also suggest you to take some time and read this very good article regarding some "sanity checks" you should always take into consideration when building a NN.

In addition, whenever possible, check if your results make sense. For example, in case of a n-class classification with categorical cross entropy the loss on the first epoch should be -ln(1/n).

Apart your specific case, I believe that apart from the Dropout the dataset split may sometimes result in this situation. Especially if the dataset split is not random (in case where temporal or spatial patterns exist) the validation set may be fundamentally different, i.e less noise or less variance, from the train and thus easier to to predict leading to higher accuracy on the validation set than on training.

Moreover, if the validation set is very small compared to the training then by random the model fits better the validation set than the training.]

answered Oct 01 '22 12:10

Mewtwo

Related questions
                            
                                Keras - Difference between categorical_accuracy and sparse_categorical_accuracy
                            
                                How to approach a number guessing game (with a twist) algorithm?
                            
                                Tensorflow vs OpenCV [closed]
                            
                                Convert Keras model to C++ [closed]
                            
                                List of tensor names in graph in Tensorflow
                            
                                tf.nn.conv2d vs tf.layers.conv2d
                            
                                Why the 6 in relu6?
                            
                                "synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'." problem in TensorFlow
                            
                                How do I check if keras is using gpu version of tensorflow?
                            
                                Tensorflow dense gradient explanation?
                            
                                What does batch, repeat, and shuffle do with TensorFlow Dataset?
                            
                                tf.shape() get wrong shape in tensorflow
                            
                                How to interpret Poolallocator messages in tensorflow?
                            
                                How to import keras from tf.keras in Tensorflow?
                            
                                Dimension of shape in conv1D
                            
                                How to count total number of trainable parameters in a tensorflow model?
                            
                                Negative dimension size caused by subtracting 3 from 1 for 'Conv2D'
                            
                                How to add if condition in a TensorFlow graph?
                            
                                Simple way to visualize a TensorFlow graph in Jupyter?
                            
                                Keras model.summary() object to string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Higher validation accuracy, than training accurracy using Tensorflow and Keras

Tags:

machine-learning

neural-network

tensorflow

classification

keras