I am trying to train a simple 2 layer Fully Connected neural net for Binary Classification in Tensorflow keras. I have split my data into Training and Validation sets with a 80-20 split using sklearn's <code>train_test_split()</code>. When I call <code>model.fit(X_train, y_train, validation_data=[X_val, y_val])</code>, it shows 0 validation loss and accuracy for all epochs, but it trains just fine. <img src="https://i.stack.imgur.com/aJVoW.png" alt="Screenshot of model.fit call and verbose log"> Also, when I try to evaluate it on the validation set, the output is non-zero. <img src="https://i.stack.imgur.com/8C34g.png" alt="Screenshot of model.evaluate function call"> Can someone please explain why I am facing this 0 loss 0 accuracy error on validation. Thanks for your help. Here is the complete sample code (MCVE) for this error: https://colab.research.google.com/drive/1P8iCUlnD87vqtuS5YTdoePcDOVEKpBHr?usp=sharing

<ul> <li> If you use <code>keras</code> instead of <code>tf.keras</code> everything works fine. </li> <li> With <code>tf.keras</code>, I even tried <code>validation_data = [X_train, y_train]</code>, this also gives zero accuracy. </li> </ul> Here is a demonstration: <pre class="prettyprint"><code>model.fit(X_train, y_train, validation_data=[X_train.to_numpy(), y_train.to_numpy()], epochs=10, batch_size=64) Epoch 1/10 8/8 [==============================] - 0s 6ms/step - loss: 0.7898 - accuracy: 0.6087 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00 Epoch 2/10 8/8 [==============================] - 0s 6ms/step - loss: 0.6710 - accuracy: 0.6500 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00 Epoch 3/10 8/8 [==============================] - 0s 5ms/step - loss: 0.6748 - accuracy: 0.6500 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00 Epoch 4/10 8/8 [==============================] - 0s 6ms/step - loss: 0.6716 - accuracy: 0.6370 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00 Epoch 5/10 8/8 [==============================] - 0s 6ms/step - loss: 0.6085 - accuracy: 0.6326 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00 Epoch 6/10 8/8 [==============================] - 0s 6ms/step - loss: 0.6744 - accuracy: 0.6326 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00 Epoch 7/10 8/8 [==============================] - 0s 6ms/step - loss: 0.6102 - accuracy: 0.6522 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00 Epoch 8/10 8/8 [==============================] - 0s 6ms/step - loss: 0.7032 - accuracy: 0.6109 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00 Epoch 9/10 8/8 [==============================] - 0s 5ms/step - loss: 0.6283 - accuracy: 0.6717 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00 Epoch 10/10 8/8 [==============================] - 0s 5ms/step - loss: 0.6120 - accuracy: 0.6652 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00 </code></pre> So, definitely there is some issue with <code>tensorflow</code> implementation of <code>fit</code>. I dug up the source, and it seems the part responsible for <code>validation_data</code>: <pre class="prettyprint"><code>... ... # Run validation. if validation_data and self._should_eval(epoch, validation_freq): val_x, val_y, val_sample_weight = ( data_adapter.unpack_x_y_sample_weight(validation_data)) val_logs = self.evaluate( x=val_x, y=val_y, sample_weight=val_sample_weight, batch_size=validation_batch_size or batch_size, steps=validation_steps, callbacks=callbacks, max_queue_size=max_queue_size, workers=workers, use_multiprocessing=use_multiprocessing, return_dict=True) val_logs = {'val_' + name: val for name, val in val_logs.items()} epoch_logs.update(val_logs) </code></pre> internally calls <code>model.evaluate</code>, as we have already established <code>evaluate</code> works fine, I realized the only culprit could be <code>unpack_x_y_sample_weight</code>. So, I looked into the implementation: <pre class="prettyprint"><code>def unpack_x_y_sample_weight(data): """Unpacks user-provided data tuple.""" if not isinstance(data, tuple): return (data, None, None) elif len(data) == 1: return (data[0], None, None) elif len(data) == 2: return (data[0], data[1], None) elif len(data) == 3: return (data[0], data[1], data[2]) raise ValueError("Data not understood.") </code></pre> It's crazy, but if you just pass a tuple instead of a list, everything works fine due to the check inside <code>unpack_x_y_sample_weight</code>. (Your labels are missing after this step and somehow the data is getting fixed inside <code>evaluate</code>, so you're training with no reasonable labels, this seems like a bug but the documentation clearly states to pass tuple) The following code gives correct validation accuracy and loss: <pre class="prettyprint"><code>model.fit(X_train, y_train, validation_data=(X_train.to_numpy(), y_train.to_numpy()), epochs=10, batch_size=64) Epoch 1/10 8/8 [==============================] - 0s 7ms/step - loss: 0.5832 - accuracy: 0.6696 - val_loss: 0.6892 - val_accuracy: 0.6674 Epoch 2/10 8/8 [==============================] - 0s 7ms/step - loss: 0.6385 - accuracy: 0.6804 - val_loss: 0.8984 - val_accuracy: 0.5565 Epoch 3/10 8/8 [==============================] - 0s 7ms/step - loss: 0.6822 - accuracy: 0.6391 - val_loss: 0.6556 - val_accuracy: 0.6739 Epoch 4/10 8/8 [==============================] - 0s 6ms/step - loss: 0.6276 - accuracy: 0.6609 - val_loss: 1.0691 - val_accuracy: 0.5630 Epoch 5/10 8/8 [==============================] - 0s 7ms/step - loss: 0.7048 - accuracy: 0.6239 - val_loss: 0.6474 - val_accuracy: 0.6326 Epoch 6/10 8/8 [==============================] - 0s 7ms/step - loss: 0.6545 - accuracy: 0.6500 - val_loss: 0.6659 - val_accuracy: 0.6043 Epoch 7/10 8/8 [==============================] - 0s 7ms/step - loss: 0.5796 - accuracy: 0.6913 - val_loss: 0.6891 - val_accuracy: 0.6435 Epoch 8/10 8/8 [==============================] - 0s 7ms/step - loss: 0.5915 - accuracy: 0.6891 - val_loss: 0.5307 - val_accuracy: 0.7152 Epoch 9/10 8/8 [==============================] - 0s 7ms/step - loss: 0.5571 - accuracy: 0.7000 - val_loss: 0.5465 - val_accuracy: 0.6957 Epoch 10/10 8/8 [==============================] - 0s 7ms/step - loss: 0.7133 - accuracy: 0.6283 - val_loss: 0.7046 - val_accuracy: 0.6413 </code></pre> So, as this seems to be a bug, I have just opened a relevant issue at Tensorflow Github repo: https://github.com/tensorflow/tensorflow/issues/39370

Keras - Validation Loss and Accuracy stuck at 0

Tags:

python

machine-learning

tensorflow

keras

tf.keras

I am trying to train a simple 2 layer Fully Connected neural net for Binary Classification in Tensorflow keras. I have split my data into Training and Validation sets with a 80-20 split using sklearn's train_test_split().

When I call model.fit(X_train, y_train, validation_data=[X_val, y_val]), it shows 0 validation loss and accuracy for all epochs, but it trains just fine.

Screenshot of model.fit call and verbose log

Also, when I try to evaluate it on the validation set, the output is non-zero.

Screenshot of model.evaluate function call

Can someone please explain why I am facing this 0 loss 0 accuracy error on validation. Thanks for your help.

Here is the complete sample code (MCVE) for this error: https://colab.research.google.com/drive/1P8iCUlnD87vqtuS5YTdoePcDOVEKpBHr?usp=sharing

775

asked May 10 '20 02:05

Animesh Sinha

1 Answers

If you use keras instead of tf.keras everything works fine.
With tf.keras, I even tried validation_data = [X_train, y_train], this also gives zero accuracy.

Here is a demonstration:

model.fit(X_train, y_train, validation_data=[X_train.to_numpy(), y_train.to_numpy()], 
epochs=10, batch_size=64)

Epoch 1/10
8/8 [==============================] - 0s 6ms/step - loss: 0.7898 - accuracy: 0.6087 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 2/10
8/8 [==============================] - 0s 6ms/step - loss: 0.6710 - accuracy: 0.6500 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 3/10
8/8 [==============================] - 0s 5ms/step - loss: 0.6748 - accuracy: 0.6500 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 4/10
8/8 [==============================] - 0s 6ms/step - loss: 0.6716 - accuracy: 0.6370 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 5/10
8/8 [==============================] - 0s 6ms/step - loss: 0.6085 - accuracy: 0.6326 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 6/10
8/8 [==============================] - 0s 6ms/step - loss: 0.6744 - accuracy: 0.6326 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 7/10
8/8 [==============================] - 0s 6ms/step - loss: 0.6102 - accuracy: 0.6522 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 8/10
8/8 [==============================] - 0s 6ms/step - loss: 0.7032 - accuracy: 0.6109 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 9/10
8/8 [==============================] - 0s 5ms/step - loss: 0.6283 - accuracy: 0.6717 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 10/10
8/8 [==============================] - 0s 5ms/step - loss: 0.6120 - accuracy: 0.6652 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00

So, definitely there is some issue with tensorflow implementation of fit.

I dug up the source, and it seems the part responsible for validation_data:

...
...
        # Run validation.
        if validation_data and self._should_eval(epoch, validation_freq):
          val_x, val_y, val_sample_weight = (
              data_adapter.unpack_x_y_sample_weight(validation_data))
          val_logs = self.evaluate(
              x=val_x,
              y=val_y,
              sample_weight=val_sample_weight,
              batch_size=validation_batch_size or batch_size,
              steps=validation_steps,
              callbacks=callbacks,
              max_queue_size=max_queue_size,
              workers=workers,
              use_multiprocessing=use_multiprocessing,
              return_dict=True)
          val_logs = {'val_' + name: val for name, val in val_logs.items()}
          epoch_logs.update(val_logs)

internally calls model.evaluate, as we have already established evaluate works fine, I realized the only culprit could be unpack_x_y_sample_weight.

So, I looked into the implementation:

def unpack_x_y_sample_weight(data):
  """Unpacks user-provided data tuple."""
  if not isinstance(data, tuple):
    return (data, None, None)
  elif len(data) == 1:
    return (data[0], None, None)
  elif len(data) == 2:
    return (data[0], data[1], None)
  elif len(data) == 3:
    return (data[0], data[1], data[2])

  raise ValueError("Data not understood.")

It's crazy, but if you just pass a tuple instead of a list, everything works fine due to the check inside unpack_x_y_sample_weight. (Your labels are missing after this step and somehow the data is getting fixed inside evaluate, so you're training with no reasonable labels, this seems like a bug but the documentation clearly states to pass tuple)

The following code gives correct validation accuracy and loss:

model.fit(X_train, y_train, validation_data=(X_train.to_numpy(), y_train.to_numpy()), 
epochs=10, batch_size=64)

Epoch 1/10
8/8 [==============================] - 0s 7ms/step - loss: 0.5832 - accuracy: 0.6696 - val_loss: 0.6892 - val_accuracy: 0.6674
Epoch 2/10
8/8 [==============================] - 0s 7ms/step - loss: 0.6385 - accuracy: 0.6804 - val_loss: 0.8984 - val_accuracy: 0.5565
Epoch 3/10
8/8 [==============================] - 0s 7ms/step - loss: 0.6822 - accuracy: 0.6391 - val_loss: 0.6556 - val_accuracy: 0.6739
Epoch 4/10
8/8 [==============================] - 0s 6ms/step - loss: 0.6276 - accuracy: 0.6609 - val_loss: 1.0691 - val_accuracy: 0.5630
Epoch 5/10
8/8 [==============================] - 0s 7ms/step - loss: 0.7048 - accuracy: 0.6239 - val_loss: 0.6474 - val_accuracy: 0.6326
Epoch 6/10
8/8 [==============================] - 0s 7ms/step - loss: 0.6545 - accuracy: 0.6500 - val_loss: 0.6659 - val_accuracy: 0.6043
Epoch 7/10
8/8 [==============================] - 0s 7ms/step - loss: 0.5796 - accuracy: 0.6913 - val_loss: 0.6891 - val_accuracy: 0.6435
Epoch 8/10
8/8 [==============================] - 0s 7ms/step - loss: 0.5915 - accuracy: 0.6891 - val_loss: 0.5307 - val_accuracy: 0.7152
Epoch 9/10
8/8 [==============================] - 0s 7ms/step - loss: 0.5571 - accuracy: 0.7000 - val_loss: 0.5465 - val_accuracy: 0.6957
Epoch 10/10
8/8 [==============================] - 0s 7ms/step - loss: 0.7133 - accuracy: 0.6283 - val_loss: 0.7046 - val_accuracy: 0.6413

So, as this seems to be a bug, I have just opened a relevant issue at Tensorflow Github repo:

https://github.com/tensorflow/tensorflow/issues/39370

156

answered Oct 08 '22 13:10

Zabir Al Nazi

Related questions
                            
                                Filtering pandas data frame by a list of id's
                            
                                GitPython tags sorted
                            
                                import check_arrays from sklearn
                            
                                How to find the first index of any of a set of characters in a string
                            
                                How to use login_required in django rest view
                            
                                Conditional assignment of tensor values in TensorFlow
                            
                                ValueError: Unable to configure handler 'file': [Errno 2] No such file or directory:
                            
                                How to insert scale bar in a map in matplotlib
                            
                                python - Error R10 (Boot timeout) -> Web process failed to bind to $PORT within 60 seconds of launch
                            
                                Default pip installation of Dask gives "ImportError: No module named toolz"
                            
                                find max value of a list with numpy nan [duplicate]
                            
                                Whats the difference between os.urandom() and random?
                            
                                How to remove password for Jupyter Notebooks and set token again
                            
                                multiple column/row facet wrap in altair
                            
                                Python Requests with wincertstore
                            
                                Difference between df.reindex() and df.set_index() methods in pandas
                            
                                heapq push TypeError: '<' not supported between instances
                            
                                An equivalent to Java volatile in Python
                            
                                Downloading a PDF using Selenium, Chrome and Python
                            
                                python displays `\n` instead of breaking a line

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With