Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is my confusion matrix returning only one number?

I'm doing a binary classification. Whenever my prediction equals the ground truth, I find sklearn.metrics.confusion_matrix to return a single value. Isn't there a problem?

from sklearn.metrics import confusion_matrix
print(confusion_matrix([True, True], [True, True])
# [[2]]

I would expect something like:

[[2 0]
 [0 0]]
like image 511
arnaud Avatar asked Dec 11 '20 09:12

arnaud


People also ask

What does confusion matrix return?

What is a confusion matrix? It is a table that is used in classification problems to assess where errors in the model were made. The rows represent the actual classes the outcomes should have been. While the columns represent the predictions we have made.

What does it mean to normalize a confusion matrix?

The “normalized” term means that each of these groupings is represented as having 1.00 samples. Thus, the sum of each row in a balanced and normalized confusion matrix is 1.00, because each row sum represents 100% of the elements in a particular topic, cluster, or class.

What is a 4 by 4 confusion matrix?

In your case understand that the 4*4 matrix denotes that you have 4 different values in your predicted variable, namely:AGN,BeXRB,HMXB,SNR. One thing more, the correct classification of the values will be on the diagonal running from top-left to bottom-right and all the other values are misclassified.


1 Answers

You should fill-in labels=[True, False]:

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_true=[True, True], y_pred=[True, True], labels=[True, False])
print(cm)

# [[2 0]
#  [0 0]]

Why?

From the docs, the output of confusion_matrix(y_true, y_pred) is:

C: ndarray of shape (n_classes, n_classes)

The variable n_classes is either:

  • guessed as the number of unique values in y_true or y_pred
  • taken from the length of optional parameters labels

In your case, because you did not fill in labels, the variable n_classes is guessed from the number of unique values in [True, True] which is 1. Hence the result.

like image 137
arnaud Avatar answered Sep 22 '22 15:09

arnaud