Just like this:
x = keras.layers.Input(shape=(3,))
y = keras.layers.Dense(5)(x)
G = keras.models.Model(x, y,name='G')
G.compile(optimizer='rmsprop', loss='mse')
data_x = np.random.random((10, 3))
data_y = np.random.random((10, 5))
G.fit(data_x,data_y,shuffle=False,validation_data=[data_x,data_y],verbose=1)
Result:
Train on 10 samples, validate on 10 samples
Epoch 1/1
10/10 [==============================] - 27s 3s/step - loss: 0.4482 - val_loss: 0.4389
The printed loss and val_loss are different.In some other test,I found the difference is significant. Why?
At times, the validation loss is greater than the training loss. This may indicate that the model is underfitting. Underfitting occurs when the model is unable to accurately model the training data, and hence generates large errors.
During validation and testing, your loss function only comprises prediction error, resulting in a generally lower loss than the training set. Notice how the gap between validation and train loss shrinks after each epoch.
val_loss is the value of cost function for your cross-validation data and loss is the value of cost function for your training data.
The training loss indicates how well the model is fitting the training data, while the validation loss indicates how well the model fits new data.
There are some additional reasons that might have caused the observed difference in the values:
According to the answer to this question of mine, the displayed training loss is computed before the optimization. So also in the case when you only train on a single batch, there is still an optimization step applied between training and validation loss evaluation.
There are layers that behave differently in training phase / testing phase, for example BatchNormalization layers or Dropout layers, as explained in the Keras FAQ. If you follow the link, there is also a code example how to get the model output for either of the two phases (without applying the optimization that is applied when you call methods like model.fit
, model.train_on_batch
etc.)
This is for completeness, although the differences would be way smaller than the ones that you have shown. When using GPU, there are some methods that may be executed non-deterministically. This may show in slight numerical differences when executing the same operation several times, although I am not sure whether it would be an issue in your concrete computation. Refer for example to the answers to this question that regards Tensorflow, or this comment that regards Theano.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With