I calculated a confusion matrix for my classifier using <code>confusion_matrix()</code> from scikit-learn. The diagonal elements of the confusion matrix represent the number of points for which the predicted label is equal to the true label, while off-diagonal elements are those that are mislabeled by the classifier. I would like to normalize my confusion matrix so that it contains only numbers between 0 and 1. I would like to read the percentage of correctly classified samples from the matrix. I found several methods how to normalize a matrix (row and column normalization) but I don't know much about maths and am not sure if this is the correct approach.

Suppose that <pre class="prettyprint"><code>>>> y_true = [0, 0, 1, 1, 2, 0, 1] >>> y_pred = [0, 1, 0, 1, 2, 2, 1] >>> C = confusion_matrix(y_true, y_pred) >>> C array([[1, 1, 1], [1, 2, 0], [0, 0, 1]]) </code></pre> Then, to find out how many samples per class have received their correct label, you need <pre class="prettyprint"><code>>>> C / C.astype(np.float).sum(axis=1) array([[ 0.33333333, 0.33333333, 1. ], [ 0.33333333, 0.66666667, 0. ], [ 0. , 0. , 1. ]]) </code></pre> The diagonal contains the required values. Another way to compute these is to realize that what you're computing is the recall per class: <pre class="prettyprint"><code>>>> from sklearn.metrics import precision_recall_fscore_support >>> _, recall, _, _ = precision_recall_fscore_support(y_true, y_pred) >>> recall array([ 0.33333333, 0.66666667, 1. ]) </code></pre> Similarly, if you divide by the sum over <code>axis=0</code>, you get the precision (fraction of class-<code>k</code> predictions that have ground truth label <code>k</code>): <pre class="prettyprint"><code>>>> C / C.astype(np.float).sum(axis=0) array([[ 0.5 , 0.33333333, 0.5 ], [ 0.5 , 0.66666667, 0. ], [ 0. , 0. , 0.5 ]]) >>> prec, _, _, _ = precision_recall_fscore_support(y_true, y_pred) >>> prec array([ 0.5 , 0.66666667, 0.5 ]) </code></pre>

From the sklearn documentation (plot example) <pre class="prettyprint"><code>cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis] </code></pre> where cm is the confusion matrix as provided by sklearn.

How to normalize a confusion matrix?

Tags:

python

matrix

scikit-learn

normalization

confusion-matrix

I calculated a confusion matrix for my classifier using confusion_matrix() from scikit-learn. The diagonal elements of the confusion matrix represent the number of points for which the predicted label is equal to the true label, while off-diagonal elements are those that are mislabeled by the classifier.

I would like to normalize my confusion matrix so that it contains only numbers between 0 and 1. I would like to read the percentage of correctly classified samples from the matrix.

I found several methods how to normalize a matrix (row and column normalization) but I don't know much about maths and am not sure if this is the correct approach.

576

asked Jan 04 '14 22:01

Kaly

2 Answers

Suppose that

>>> y_true = [0, 0, 1, 1, 2, 0, 1] >>> y_pred = [0, 1, 0, 1, 2, 2, 1] >>> C = confusion_matrix(y_true, y_pred) >>> C array([[1, 1, 1],        [1, 2, 0],        [0, 0, 1]])

Then, to find out how many samples per class have received their correct label, you need

>>> C / C.astype(np.float).sum(axis=1) array([[ 0.33333333,  0.33333333,  1.        ],        [ 0.33333333,  0.66666667,  0.        ],        [ 0.        ,  0.        ,  1.        ]])

The diagonal contains the required values. Another way to compute these is to realize that what you're computing is the recall per class:

>>> from sklearn.metrics import precision_recall_fscore_support >>> _, recall, _, _ = precision_recall_fscore_support(y_true, y_pred) >>> recall array([ 0.33333333,  0.66666667,  1.        ])

Similarly, if you divide by the sum over axis=0, you get the precision (fraction of class-k predictions that have ground truth label k):

>>> C / C.astype(np.float).sum(axis=0) array([[ 0.5       ,  0.33333333,  0.5       ],        [ 0.5       ,  0.66666667,  0.        ],        [ 0.        ,  0.        ,  0.5       ]]) >>> prec, _, _, _ = precision_recall_fscore_support(y_true, y_pred) >>> prec array([ 0.5       ,  0.66666667,  0.5       ])

answered Sep 20 '22 13:09

Fred Foo

From the sklearn documentation (plot example)

cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]

where cm is the confusion matrix as provided by sklearn.

answered Sep 18 '22 13:09

Antoni

Related questions
                            
                                Finding a key recursively in a dictionary
                            
                                Count the uppercase letters in a string with Python
                            
                                Can I pickle a python dictionary into a sqlite3 text field?
                            
                                Python functions with multiple parameter brackets
                            
                                Equivalent Javascript Functions for Python's urllib.quote() and urllib.unquote()
                            
                                Case insensitive dictionary search? [duplicate]
                            
                                Quick and easy: trayicon with python?
                            
                                Converting int arrays to string arrays in numpy without truncation
                            
                                Database does not update automatically with MySQL and Python
                            
                                How to just call a command and not get its output [duplicate]
                            
                                Recursive diff of two dictionaries (keys and values)?
                            
                                Python String to Int Or None
                            
                                How to see pip package sizes installed?
                            
                                Django viewset has not attribute 'get_extra_actions'
                            
                                How to subclass str in Python
                            
                                Fetch a Wikipedia article with Python
                            
                                Sharing scripts that require a virtualenv to be activated
                            
                                Using Flask, how do I modify the Cache-Control header for ALL output?
                            
                                Increase DPI of Matplotlib .show() in Jupyter Notebook
                            
                                In python is there an easier way to write 6 nested for loops?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With