Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why val_loss is different from training loss when use the same training data as validation data?

Just like this:

x = keras.layers.Input(shape=(3,))
y = keras.layers.Dense(5)(x)

G = keras.models.Model(x, y,name='G')
G.compile(optimizer='rmsprop', loss='mse')

data_x = np.random.random((10, 3))
data_y = np.random.random((10, 5))

G.fit(data_x,data_y,shuffle=False,validation_data=[data_x,data_y],verbose=1)

Result:

Train on 10 samples, validate on 10 samples
Epoch 1/1
10/10 [==============================] - 27s 3s/step - loss: 0.4482 - val_loss: 0.4389

The printed loss and val_loss are different.In some other test,I found the difference is significant. Why?

like image 784
spider Avatar asked Mar 08 '18 04:03

spider


People also ask

Should training Loss and Validation loss be the same?

At times, the validation loss is greater than the training loss. This may indicate that the model is underfitting. Underfitting occurs when the model is unable to accurately model the training data, and hence generates large errors.

Why is my validation loss less than my training loss?

During validation and testing, your loss function only comprises prediction error, resulting in a generally lower loss than the training set. Notice how the gap between validation and train loss shrinks after each epoch.

What is the difference between loss and Val_loss?

val_loss is the value of cost function for your cross-validation data and loss is the value of cost function for your training data.

What is the difference between training and validation loss?

The training loss indicates how well the model is fitting the training data, while the validation loss indicates how well the model fits new data.


1 Answers

There are some additional reasons that might have caused the observed difference in the values:

  1. According to the answer to this question of mine, the displayed training loss is computed before the optimization. So also in the case when you only train on a single batch, there is still an optimization step applied between training and validation loss evaluation.

  2. There are layers that behave differently in training phase / testing phase, for example BatchNormalization layers or Dropout layers, as explained in the Keras FAQ. If you follow the link, there is also a code example how to get the model output for either of the two phases (without applying the optimization that is applied when you call methods like model.fit, model.train_on_batch etc.)

  3. This is for completeness, although the differences would be way smaller than the ones that you have shown. When using GPU, there are some methods that may be executed non-deterministically. This may show in slight numerical differences when executing the same operation several times, although I am not sure whether it would be an issue in your concrete computation. Refer for example to the answers to this question that regards Tensorflow, or this comment that regards Theano.

like image 162
KiraMichiru Avatar answered Oct 16 '22 14:10

KiraMichiru