Just like this: <pre class="prettyprint lang-python prettyprint-override"><code>x = keras.layers.Input(shape=(3,)) y = keras.layers.Dense(5)(x) G = keras.models.Model(x, y,name='G') G.compile(optimizer='rmsprop', loss='mse') data_x = np.random.random((10, 3)) data_y = np.random.random((10, 5)) G.fit(data_x,data_y,shuffle=False,validation_data=[data_x,data_y],verbose=1) </code></pre> Result: <pre class="prettyprint lang-python prettyprint-override"><code>Train on 10 samples, validate on 10 samples Epoch 1/1 10/10 [==============================] - 27s 3s/step - loss: 0.4482 - val_loss: 0.4389 </code></pre> The printed loss and val_loss are different.In some other test,I found the difference is significant. Why?

There are some additional reasons that might have caused the observed difference in the values: <ol> <li>According to the answer to this question of mine, the displayed training loss is computed before the optimization. So also in the case when you only train on a single batch, there is still an optimization step applied between training and validation loss evaluation.</li> <li>There are layers that behave differently in training phase / testing phase, for example BatchNormalization layers or Dropout layers, as explained in the Keras FAQ. If you follow the link, there is also a code example how to get the model output for either of the two phases (without applying the optimization that is applied when you call methods like <code>model.fit</code>, <code>model.train_on_batch</code> etc.)</li> <li>This is for completeness, although the differences would be way smaller than the ones that you have shown. When using GPU, there are some methods that may be executed non-deterministically. This may show in slight numerical differences when executing the same operation several times, although I am not sure whether it would be an issue in your concrete computation. Refer for example to the answers to this question that regards Tensorflow, or this comment that regards Theano.</li> </ol>

Why val_loss is different from training loss when use the same training data as validation data?

Tags:

machine-learning

deep-learning

keras

Just like this:

x = keras.layers.Input(shape=(3,))
y = keras.layers.Dense(5)(x)

G = keras.models.Model(x, y,name='G')
G.compile(optimizer='rmsprop', loss='mse')

data_x = np.random.random((10, 3))
data_y = np.random.random((10, 5))

G.fit(data_x,data_y,shuffle=False,validation_data=[data_x,data_y],verbose=1)

Result:

Train on 10 samples, validate on 10 samples
Epoch 1/1
10/10 [==============================] - 27s 3s/step - loss: 0.4482 - val_loss: 0.4389

The printed loss and val_loss are different.In some other test,I found the difference is significant. Why?

784

asked Mar 08 '18 04:03

spider

1 Answers

There are some additional reasons that might have caused the observed difference in the values:

According to the answer to this question of mine, the displayed training loss is computed before the optimization. So also in the case when you only train on a single batch, there is still an optimization step applied between training and validation loss evaluation.
There are layers that behave differently in training phase / testing phase, for example BatchNormalization layers or Dropout layers, as explained in the Keras FAQ. If you follow the link, there is also a code example how to get the model output for either of the two phases (without applying the optimization that is applied when you call methods like model.fit, model.train_on_batch etc.)
This is for completeness, although the differences would be way smaller than the ones that you have shown. When using GPU, there are some methods that may be executed non-deterministically. This may show in slight numerical differences when executing the same operation several times, although I am not sure whether it would be an issue in your concrete computation. Refer for example to the answers to this question that regards Tensorflow, or this comment that regards Theano.

162

answered Oct 16 '22 14:10

KiraMichiru

Related questions
                            
                                Python keras how to transform a dense layer into a convolutional layer
                            
                                How does one calculate the GPU memory required to run a model in TensorFlow?
                            
                                Is it possible to merge multiple TensorFlow graphs into one?
                            
                                Online learning of LDA model in Spark
                            
                                Error using sklearn and linear regression: shapes (1,16) and (1,1) not aligned: 16 (dim 1) != 1 (dim 0)
                            
                                Extract the coefficients for the best tuning parameters of a glmnet model in caret
                            
                                How do I merge two trained neural network weight matrices into one?
                            
                                Why is Adjusted rand index(ARI) better than rand index(RI) and how to understand ARI intuitively from the formula
                            
                                Loss, metrics, and scoring in Keras
                            
                                Plotting numpy array using Seaborn
                            
                                What is the difference between Model.train_on_batch from keras and Session.run([train_optimizer]) from tensorflow?
                            
                                Searching an Image Database Using SIFT
                            
                                How does Google News automatically categorize articles into Tech/Science/Health/Entertainment/etc?
                            
                                Q Learning Algorithm for Tic Tac Toe
                            
                                Do you need to standardize inputs if you are using Batch Normalization?
                            
                                Siamese network output
                            
                                Adding a variable into Keras/TensorFlow CNN dense layer
                            
                                Stateful LSTM: When to reset states?
                            
                                Getting ValueError: y contains new labels when using scikit learn's LabelEncoder
                            
                                What is this feature column and how does it affect the training?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With