Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Same value for Keras 2.3.0 metrics accuracy, precision and recall

Tags:

I'm trying to get keras metrics for accuracy, precision and recall, but all three of them are showing the same value, which is actually the accuracy.

I'm using the metrics list provided in an example of TensorFlow documentation:

metrics = [keras.metrics.TruePositives(name='tp'),
           keras.metrics.FalsePositives(name='fp'),
           keras.metrics.TrueNegatives(name='tn'),
           keras.metrics.FalseNegatives(name='fn'),
           keras.metrics.BinaryAccuracy(name='accuracy'),
           keras.metrics.Precision(name='precision'),
           keras.metrics.Recall(name='recall'),
           keras.metrics.AUC(name='auc')]

Model is a pretty basic CNN for image classification:

model = Sequential()

model.add(Convolution2D(32, 
                      (7, 7), 
                      padding ="same", 
                      input_shape=(255, 255, 3), 
                      activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(64, 
                      (3, 3), 
                      padding ="same"))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(256, 
              activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(n_classes, 
              activation='softmax'))

Compiling with the metric list shown above:

model.compile(loss=loss,
            optimizer=optimizer,
            metrics=metrics)

This is an example of the problem I see all the time while training:

Epoch 1/15
160/160 [==============================] - 6s 37ms/step - loss: 0.6402 - tp: 215.0000 - fp: 105.0000 - tn: 215.0000 - fn: 105.0000 - accuracy: 0.6719 - precision: 0.6719 - recall: 0.6719 - auc: 0.7315 - val_loss: 0.6891 - val_tp: 38.0000 - val_fp: 42.0000 - val_tn: 38.0000 - val_fn: 42.0000 - val_accuracy: 0.4750 - val_precision: 0.4750 - val_recall: 0.4750 - val_auc: 0.7102
Epoch 2/15
160/160 [==============================] - 5s 30ms/step - loss: 0.6929 - tp: 197.0000 - fp: 123.0000 - tn: 197.0000 - fn: 123.0000 - accuracy: 0.6156 - precision: 0.6156 - recall: 0.6156 - auc: 0.6941 - val_loss: 0.6906 - val_tp: 38.0000 - val_fp: 42.0000 - val_tn: 38.0000 - val_fn: 42.0000 - val_accuracy: 0.4750 - val_precision: 0.4750 - val_recall: 0.4750 - val_auc: 0.6759

Metrics per fold, with the same value for accuracy, precision and recall every time:

['loss', 'tp', 'fp', 'tn', 'fn', 'accuracy', 'precision', 'recall', 'auc']
[[ 0.351 70.    10.    70.    10.     0.875  0.875  0.875  0.945]
 [ 0.091 78.     2.    78.     2.     0.975  0.975  0.975  0.995]
 [ 0.253 72.     8.    72.     8.     0.9    0.9    0.9    0.974]
 [ 0.04  78.     2.    78.     2.     0.975  0.975  0.975  0.999]
 [ 0.021 80.     0.    80.     0.     1.     1.     1.     1.   ]]

sklearn.metrics.classification_report shows right precision and recall

================ Fold 1 =====================
Accuracy: 0.8875
              precision    recall  f1-score   support

      normal       0.84      0.95      0.89        38
          pm       0.95      0.83      0.89        42

    accuracy                           0.89        80
   macro avg       0.89      0.89      0.89        80
weighted avg       0.89      0.89      0.89        80

================ Fold 2 =====================
Accuracy: 0.9375
              precision    recall  f1-score   support

      normal       1.00      0.87      0.93        38
          pm       0.89      1.00      0.94        42

    accuracy                           0.94        80
   macro avg       0.95      0.93      0.94        80
weighted avg       0.94      0.94      0.94        80

================ Fold 3 =====================
Accuracy: 0.925
              precision    recall  f1-score   support

      normal       0.88      0.97      0.92        37
          pm       0.97      0.88      0.93        43

    accuracy                           0.93        80
   macro avg       0.93      0.93      0.92        80
weighted avg       0.93      0.93      0.93        80

================ Fold 4 =====================
Accuracy: 0.925
              precision    recall  f1-score   support

      normal       0.97      0.86      0.91        37
          pm       0.89      0.98      0.93        43

    accuracy                           0.93        80
   macro avg       0.93      0.92      0.92        80
weighted avg       0.93      0.93      0.92        80

================ Fold 5 =====================
Accuracy: 1.0
              precision    recall  f1-score   support

      normal       1.00      1.00      1.00        37
          pm       1.00      1.00      1.00        43

    accuracy                           1.00        80
   macro avg       1.00      1.00      1.00        80
weighted avg       1.00      1.00      1.00        80
like image 507
Daniel López Avatar asked May 16 '20 11:05

Daniel López


2 Answers

When I posted my question I didn't realize the true positives and false positives had also the same value as true negatives and false negatives. My validation set has 80 observations, so these metrics for tp, fp, tn and fn actually meant that 70 observations were correctly predicted while 10 were wrong, no matter the class of each observation:

      1. 10.

I wasn't able to figure out why all these metrics were messed up, maybe it's just the issue Zabir Al Nazi kindly mentioned. However, I was able to get proper metrics thanks to some small changes:

  • Loss function: binary_crossentropy instead of categorical_crossentropy.
  • Top layer: 1 neuron sigmoid instead of n_classes neurons softmax.
  • Labels shape: 1D numpy array instead of one-hot encoded.

I hope this can help someone else.

like image 187
Daniel López Avatar answered Sep 22 '22 16:09

Daniel López


The problem of having equal TP and TN lies on the use of labels formatted as one-hot encoded vectors for binary classification. The labels in one-hot encoded vector are expressed as: [[0,1], [0,1], [1,0],[1,0],[0,1],[1,0],….,[0,1],[1,0]], so, whenever the algorithm predicts correct the class A expressed as [1,0] in the label; the metrics receive as correct both the TP of A and the TN for class B. Therefore, it ends up having 70 TP and 70 TN on a sample of 80 observations.

The solution described in your update with more details:

  1. Transform the output of the dense layer to have 1 output class: model.add(Dense(1, activation='sigmoid'))

  2. Change the format of y to 1d array having [1,1,0,0,1,0….,1,0] instead of one-hot vector [[0,1], [0,1], [1,0],[1,0],[0,1],[1,0],….,[0,1],[1,0]] and

  3. Change the loss function to BinaryCrossentropy like: model.compile(loss="BinaryCrossentropy", optimizer=optimizer, metrics=metrics)

Keras does not offer an "automatic transition" from a multi-label classification problem to a binary one.

like image 38
Tasos Lytos Avatar answered Sep 19 '22 16:09

Tasos Lytos