I'm trying to get keras metrics for accuracy, precision and recall, but all three of them are showing the same value, which is actually the accuracy.
I'm using the metrics list provided in an example of TensorFlow documentation:
metrics = [keras.metrics.TruePositives(name='tp'),
keras.metrics.FalsePositives(name='fp'),
keras.metrics.TrueNegatives(name='tn'),
keras.metrics.FalseNegatives(name='fn'),
keras.metrics.BinaryAccuracy(name='accuracy'),
keras.metrics.Precision(name='precision'),
keras.metrics.Recall(name='recall'),
keras.metrics.AUC(name='auc')]
Model is a pretty basic CNN for image classification:
model = Sequential()
model.add(Convolution2D(32,
(7, 7),
padding ="same",
input_shape=(255, 255, 3),
activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(64,
(3, 3),
padding ="same"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(256,
activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(n_classes,
activation='softmax'))
Compiling with the metric list shown above:
model.compile(loss=loss,
optimizer=optimizer,
metrics=metrics)
This is an example of the problem I see all the time while training:
Epoch 1/15
160/160 [==============================] - 6s 37ms/step - loss: 0.6402 - tp: 215.0000 - fp: 105.0000 - tn: 215.0000 - fn: 105.0000 - accuracy: 0.6719 - precision: 0.6719 - recall: 0.6719 - auc: 0.7315 - val_loss: 0.6891 - val_tp: 38.0000 - val_fp: 42.0000 - val_tn: 38.0000 - val_fn: 42.0000 - val_accuracy: 0.4750 - val_precision: 0.4750 - val_recall: 0.4750 - val_auc: 0.7102
Epoch 2/15
160/160 [==============================] - 5s 30ms/step - loss: 0.6929 - tp: 197.0000 - fp: 123.0000 - tn: 197.0000 - fn: 123.0000 - accuracy: 0.6156 - precision: 0.6156 - recall: 0.6156 - auc: 0.6941 - val_loss: 0.6906 - val_tp: 38.0000 - val_fp: 42.0000 - val_tn: 38.0000 - val_fn: 42.0000 - val_accuracy: 0.4750 - val_precision: 0.4750 - val_recall: 0.4750 - val_auc: 0.6759
Metrics per fold, with the same value for accuracy, precision and recall every time:
['loss', 'tp', 'fp', 'tn', 'fn', 'accuracy', 'precision', 'recall', 'auc']
[[ 0.351 70. 10. 70. 10. 0.875 0.875 0.875 0.945]
[ 0.091 78. 2. 78. 2. 0.975 0.975 0.975 0.995]
[ 0.253 72. 8. 72. 8. 0.9 0.9 0.9 0.974]
[ 0.04 78. 2. 78. 2. 0.975 0.975 0.975 0.999]
[ 0.021 80. 0. 80. 0. 1. 1. 1. 1. ]]
sklearn.metrics.classification_report shows right precision and recall
================ Fold 1 =====================
Accuracy: 0.8875
precision recall f1-score support
normal 0.84 0.95 0.89 38
pm 0.95 0.83 0.89 42
accuracy 0.89 80
macro avg 0.89 0.89 0.89 80
weighted avg 0.89 0.89 0.89 80
================ Fold 2 =====================
Accuracy: 0.9375
precision recall f1-score support
normal 1.00 0.87 0.93 38
pm 0.89 1.00 0.94 42
accuracy 0.94 80
macro avg 0.95 0.93 0.94 80
weighted avg 0.94 0.94 0.94 80
================ Fold 3 =====================
Accuracy: 0.925
precision recall f1-score support
normal 0.88 0.97 0.92 37
pm 0.97 0.88 0.93 43
accuracy 0.93 80
macro avg 0.93 0.93 0.92 80
weighted avg 0.93 0.93 0.93 80
================ Fold 4 =====================
Accuracy: 0.925
precision recall f1-score support
normal 0.97 0.86 0.91 37
pm 0.89 0.98 0.93 43
accuracy 0.93 80
macro avg 0.93 0.92 0.92 80
weighted avg 0.93 0.93 0.92 80
================ Fold 5 =====================
Accuracy: 1.0
precision recall f1-score support
normal 1.00 1.00 1.00 37
pm 1.00 1.00 1.00 43
accuracy 1.00 80
macro avg 1.00 1.00 1.00 80
weighted avg 1.00 1.00 1.00 80
When I posted my question I didn't realize the true positives and false positives had also the same value as true negatives and false negatives. My validation set has 80 observations, so these metrics for tp, fp, tn and fn actually meant that 70 observations were correctly predicted while 10 were wrong, no matter the class of each observation:
I wasn't able to figure out why all these metrics were messed up, maybe it's just the issue Zabir Al Nazi kindly mentioned. However, I was able to get proper metrics thanks to some small changes:
I hope this can help someone else.
The problem of having equal TP and TN lies on the use of labels formatted as one-hot encoded vectors for binary classification. The labels in one-hot encoded vector are expressed as: [[0,1], [0,1], [1,0],[1,0],[0,1],[1,0],….,[0,1],[1,0]], so, whenever the algorithm predicts correct the class A expressed as [1,0] in the label; the metrics receive as correct both the TP of A and the TN for class B. Therefore, it ends up having 70 TP and 70 TN on a sample of 80 observations.
The solution described in your update with more details:
Transform the output of the dense layer to have 1 output class:
model.add(Dense(1, activation='sigmoid'))
Change the format of y to 1d array having [1,1,0,0,1,0….,1,0] instead of one-hot vector [[0,1], [0,1], [1,0],[1,0],[0,1],[1,0],….,[0,1],[1,0]] and
Change the loss function to BinaryCrossentropy like: model.compile(loss="BinaryCrossentropy", optimizer=optimizer, metrics=metrics)
Keras does not offer an "automatic transition" from a multi-label classification problem to a binary one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With