Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to normalize a confusion matrix?

I calculated a confusion matrix for my classifier using confusion_matrix() from scikit-learn. The diagonal elements of the confusion matrix represent the number of points for which the predicted label is equal to the true label, while off-diagonal elements are those that are mislabeled by the classifier.

I would like to normalize my confusion matrix so that it contains only numbers between 0 and 1. I would like to read the percentage of correctly classified samples from the matrix.

I found several methods how to normalize a matrix (row and column normalization) but I don't know much about maths and am not sure if this is the correct approach.

like image 576
Kaly Avatar asked Jan 04 '14 22:01

Kaly


People also ask

How is a confusion matrix normalized?

4.7 Confusion matrix patterns The “normalized” term means that each of these groupings is represented as having 1.00 samples. Thus, the sum of each row in a balanced and normalized confusion matrix is 1.00, because each row sum represents 100% of the elements in a particular topic, cluster, or class.

What is a good accuracy for confusion matrix?

The best accuracy is 1.0, whereas the worst is 0.0. It can also be calculated by 1 – ERR. Accuracy is calculated as the total number of two correct predictions (TP + TN) divided by the total number of a dataset (P + N).


2 Answers

Suppose that

>>> y_true = [0, 0, 1, 1, 2, 0, 1] >>> y_pred = [0, 1, 0, 1, 2, 2, 1] >>> C = confusion_matrix(y_true, y_pred) >>> C array([[1, 1, 1],        [1, 2, 0],        [0, 0, 1]]) 

Then, to find out how many samples per class have received their correct label, you need

>>> C / C.astype(np.float).sum(axis=1) array([[ 0.33333333,  0.33333333,  1.        ],        [ 0.33333333,  0.66666667,  0.        ],        [ 0.        ,  0.        ,  1.        ]]) 

The diagonal contains the required values. Another way to compute these is to realize that what you're computing is the recall per class:

>>> from sklearn.metrics import precision_recall_fscore_support >>> _, recall, _, _ = precision_recall_fscore_support(y_true, y_pred) >>> recall array([ 0.33333333,  0.66666667,  1.        ]) 

Similarly, if you divide by the sum over axis=0, you get the precision (fraction of class-k predictions that have ground truth label k):

>>> C / C.astype(np.float).sum(axis=0) array([[ 0.5       ,  0.33333333,  0.5       ],        [ 0.5       ,  0.66666667,  0.        ],        [ 0.        ,  0.        ,  0.5       ]]) >>> prec, _, _, _ = precision_recall_fscore_support(y_true, y_pred) >>> prec array([ 0.5       ,  0.66666667,  0.5       ]) 
like image 80
Fred Foo Avatar answered Sep 20 '22 13:09

Fred Foo


From the sklearn documentation (plot example)

cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis] 

where cm is the confusion matrix as provided by sklearn.

like image 30
Antoni Avatar answered Sep 18 '22 13:09

Antoni