Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras - Validation Loss and Accuracy stuck at 0

I am trying to train a simple 2 layer Fully Connected neural net for Binary Classification in Tensorflow keras. I have split my data into Training and Validation sets with a 80-20 split using sklearn's train_test_split().

When I call model.fit(X_train, y_train, validation_data=[X_val, y_val]), it shows 0 validation loss and accuracy for all epochs, but it trains just fine.

Screenshot of model.fit call and verbose log

Also, when I try to evaluate it on the validation set, the output is non-zero.

Screenshot of model.evaluate function call

Can someone please explain why I am facing this 0 loss 0 accuracy error on validation. Thanks for your help.

Here is the complete sample code (MCVE) for this error: https://colab.research.google.com/drive/1P8iCUlnD87vqtuS5YTdoePcDOVEKpBHr?usp=sharing

like image 775
Animesh Sinha Avatar asked May 10 '20 02:05

Animesh Sinha


People also ask

Why is my validation loss constant?

It is likely that your model is overfitting to the data, especially given the size of your dataset compared to the size of your data. There is most likely nothing wrong with your code, but instead you should be using either a smaller model or increase the size of your dataset, or more likely both.

Why is my validation loss less than training loss?

The regularization terms are only applied while training the model on the training set, inflating the training loss. During validation and testing, your loss function only comprises prediction error, resulting in a generally lower loss than the training set.

Why is my validation loss fluctuating?

Your validation loss is varying wildly because your validation set is likely not representative of the whole dataset. I would recommend shuffling/resampling the validation set, or using a larger validation fraction.

Why is validation accuracy decreasing?

Validation accuracy will be usually less than training accuracy because training data is something with which the model is already familiar with and validation data is a collection of new data points which is new to the model.


1 Answers

  • If you use keras instead of tf.keras everything works fine.

  • With tf.keras, I even tried validation_data = [X_train, y_train], this also gives zero accuracy.

Here is a demonstration:

model.fit(X_train, y_train, validation_data=[X_train.to_numpy(), y_train.to_numpy()], 
epochs=10, batch_size=64)

Epoch 1/10
8/8 [==============================] - 0s 6ms/step - loss: 0.7898 - accuracy: 0.6087 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 2/10
8/8 [==============================] - 0s 6ms/step - loss: 0.6710 - accuracy: 0.6500 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 3/10
8/8 [==============================] - 0s 5ms/step - loss: 0.6748 - accuracy: 0.6500 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 4/10
8/8 [==============================] - 0s 6ms/step - loss: 0.6716 - accuracy: 0.6370 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 5/10
8/8 [==============================] - 0s 6ms/step - loss: 0.6085 - accuracy: 0.6326 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 6/10
8/8 [==============================] - 0s 6ms/step - loss: 0.6744 - accuracy: 0.6326 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 7/10
8/8 [==============================] - 0s 6ms/step - loss: 0.6102 - accuracy: 0.6522 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 8/10
8/8 [==============================] - 0s 6ms/step - loss: 0.7032 - accuracy: 0.6109 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 9/10
8/8 [==============================] - 0s 5ms/step - loss: 0.6283 - accuracy: 0.6717 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 10/10
8/8 [==============================] - 0s 5ms/step - loss: 0.6120 - accuracy: 0.6652 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00

So, definitely there is some issue with tensorflow implementation of fit.

I dug up the source, and it seems the part responsible for validation_data:

...
...
        # Run validation.
        if validation_data and self._should_eval(epoch, validation_freq):
          val_x, val_y, val_sample_weight = (
              data_adapter.unpack_x_y_sample_weight(validation_data))
          val_logs = self.evaluate(
              x=val_x,
              y=val_y,
              sample_weight=val_sample_weight,
              batch_size=validation_batch_size or batch_size,
              steps=validation_steps,
              callbacks=callbacks,
              max_queue_size=max_queue_size,
              workers=workers,
              use_multiprocessing=use_multiprocessing,
              return_dict=True)
          val_logs = {'val_' + name: val for name, val in val_logs.items()}
          epoch_logs.update(val_logs)

internally calls model.evaluate, as we have already established evaluate works fine, I realized the only culprit could be unpack_x_y_sample_weight.

So, I looked into the implementation:

def unpack_x_y_sample_weight(data):
  """Unpacks user-provided data tuple."""
  if not isinstance(data, tuple):
    return (data, None, None)
  elif len(data) == 1:
    return (data[0], None, None)
  elif len(data) == 2:
    return (data[0], data[1], None)
  elif len(data) == 3:
    return (data[0], data[1], data[2])

  raise ValueError("Data not understood.")

It's crazy, but if you just pass a tuple instead of a list, everything works fine due to the check inside unpack_x_y_sample_weight. (Your labels are missing after this step and somehow the data is getting fixed inside evaluate, so you're training with no reasonable labels, this seems like a bug but the documentation clearly states to pass tuple)

The following code gives correct validation accuracy and loss:

model.fit(X_train, y_train, validation_data=(X_train.to_numpy(), y_train.to_numpy()), 
epochs=10, batch_size=64)

Epoch 1/10
8/8 [==============================] - 0s 7ms/step - loss: 0.5832 - accuracy: 0.6696 - val_loss: 0.6892 - val_accuracy: 0.6674
Epoch 2/10
8/8 [==============================] - 0s 7ms/step - loss: 0.6385 - accuracy: 0.6804 - val_loss: 0.8984 - val_accuracy: 0.5565
Epoch 3/10
8/8 [==============================] - 0s 7ms/step - loss: 0.6822 - accuracy: 0.6391 - val_loss: 0.6556 - val_accuracy: 0.6739
Epoch 4/10
8/8 [==============================] - 0s 6ms/step - loss: 0.6276 - accuracy: 0.6609 - val_loss: 1.0691 - val_accuracy: 0.5630
Epoch 5/10
8/8 [==============================] - 0s 7ms/step - loss: 0.7048 - accuracy: 0.6239 - val_loss: 0.6474 - val_accuracy: 0.6326
Epoch 6/10
8/8 [==============================] - 0s 7ms/step - loss: 0.6545 - accuracy: 0.6500 - val_loss: 0.6659 - val_accuracy: 0.6043
Epoch 7/10
8/8 [==============================] - 0s 7ms/step - loss: 0.5796 - accuracy: 0.6913 - val_loss: 0.6891 - val_accuracy: 0.6435
Epoch 8/10
8/8 [==============================] - 0s 7ms/step - loss: 0.5915 - accuracy: 0.6891 - val_loss: 0.5307 - val_accuracy: 0.7152
Epoch 9/10
8/8 [==============================] - 0s 7ms/step - loss: 0.5571 - accuracy: 0.7000 - val_loss: 0.5465 - val_accuracy: 0.6957
Epoch 10/10
8/8 [==============================] - 0s 7ms/step - loss: 0.7133 - accuracy: 0.6283 - val_loss: 0.7046 - val_accuracy: 0.6413

So, as this seems to be a bug, I have just opened a relevant issue at Tensorflow Github repo:

https://github.com/tensorflow/tensorflow/issues/39370

like image 156
Zabir Al Nazi Avatar answered Oct 08 '22 13:10

Zabir Al Nazi