Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write a confusion matrix in Python?

I wrote a confusion matrix calculation code in Python:

def conf_mat(prob_arr, input_arr):         # confusion matrix         conf_arr = [[0, 0], [0, 0]]          for i in range(len(prob_arr)):                 if int(input_arr[i]) == 1:                         if float(prob_arr[i]) < 0.5:                                 conf_arr[0][1] = conf_arr[0][1] + 1                         else:                                 conf_arr[0][0] = conf_arr[0][0] + 1                 elif int(input_arr[i]) == 2:                         if float(prob_arr[i]) >= 0.5:                                 conf_arr[1][0] = conf_arr[1][0] +1                         else:                                 conf_arr[1][1] = conf_arr[1][1] +1          accuracy = float(conf_arr[0][0] + conf_arr[1][1])/(len(input_arr)) 

prob_arr is an array that my classification code returned and a sample array is like this:

 [1.0, 1.0, 1.0, 0.41592955657342651, 1.0, 0.0053405015805891975, 4.5321494433440449e-299, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.70943426182688163, 1.0, 1.0, 1.0, 1.0] 

input_arr is the original class labels for a dataset and it is like this:

[2, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 2, 1, 1, 1] 

What my code is trying to do is: i get prob_arr and input_arr and for each class (1 and 2) I check if they are misclassified or not.

But my code only works for two classes. If I run this code for a multiple classed data, it doesn't work. How can I make this for multiple classes?

For example, for a data set with three classes, it should return: [[21,7,3],[3,38,6],[5,4,19]]

like image 923
Arja Varvio Avatar asked Jan 27 '10 16:01

Arja Varvio


People also ask

How do you make a confusion matrix in python without Sklearn?

You can derive the confusion matrix by counting the number of instances in each combination of actual and predicted classes as follows: import numpy as np def comp_confmat(actual, predicted): # extract the different classes classes = np. unique(actual) # initialize the confusion matrix confmat = np.


2 Answers

Scikit-Learn provides a confusion_matrix function

from sklearn.metrics import confusion_matrix y_actu = [2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2] y_pred = [0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2] confusion_matrix(y_actu, y_pred) 

which output a Numpy array

array([[3, 0, 0],        [0, 1, 2],        [2, 1, 3]]) 

But you can also create a confusion matrix using Pandas:

import pandas as pd y_actu = pd.Series([2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2], name='Actual') y_pred = pd.Series([0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2], name='Predicted') df_confusion = pd.crosstab(y_actu, y_pred) 

You will get a (nicely labeled) Pandas DataFrame:

Predicted  0  1  2 Actual 0          3  0  0 1          0  1  2 2          2  1  3 

If you add margins=True like

df_confusion = pd.crosstab(y_actu, y_pred, rownames=['Actual'], colnames=['Predicted'], margins=True) 

you will get also sum for each row and column:

Predicted  0  1  2  All Actual 0          3  0  0    3 1          0  1  2    3 2          2  1  3    6 All        5  2  5   12 

You can also get a normalized confusion matrix using:

df_conf_norm = df_confusion / df_confusion.sum(axis=1)  Predicted         0         1         2 Actual 0          1.000000  0.000000  0.000000 1          0.000000  0.333333  0.333333 2          0.666667  0.333333  0.500000 

You can plot this confusion_matrix using

import matplotlib.pyplot as plt def plot_confusion_matrix(df_confusion, title='Confusion matrix', cmap=plt.cm.gray_r):     plt.matshow(df_confusion, cmap=cmap) # imshow     #plt.title(title)     plt.colorbar()     tick_marks = np.arange(len(df_confusion.columns))     plt.xticks(tick_marks, df_confusion.columns, rotation=45)     plt.yticks(tick_marks, df_confusion.index)     #plt.tight_layout()     plt.ylabel(df_confusion.index.name)     plt.xlabel(df_confusion.columns.name)  plot_confusion_matrix(df_confusion) 

plot confusion matrix

Or plot normalized confusion matrix using:

plot_confusion_matrix(df_conf_norm)   

plot confusion matrix normalized

You might also be interested by this project https://github.com/pandas-ml/pandas-ml and its Pip package https://pypi.python.org/pypi/pandas_ml

With this package confusion matrix can be pretty-printed, plot. You can binarize a confusion matrix, get class statistics such as TP, TN, FP, FN, ACC, TPR, FPR, FNR, TNR (SPC), LR+, LR-, DOR, PPV, FDR, FOR, NPV and some overall statistics

In [1]: from pandas_ml import ConfusionMatrix In [2]: y_actu = [2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2] In [3]: y_pred = [0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2] In [4]: cm = ConfusionMatrix(y_actu, y_pred) In [5]: cm.print_stats() Confusion Matrix:  Predicted  0  1  2  __all__ Actual 0          3  0  0        3 1          0  1  2        3 2          2  1  3        6 __all__    5  2  5       12   Overall Statistics:  Accuracy: 0.583333333333 95% CI: (0.27666968568210581, 0.84834777019156982) No Information Rate: ToDo P-Value [Acc > NIR]: 0.189264302376 Kappa: 0.354838709677 Mcnemar's Test P-Value: ToDo   Class Statistics:  Classes                                        0          1          2 Population                                    12         12         12 P: Condition positive                          3          3          6 N: Condition negative                          9          9          6 Test outcome positive                          5          2          5 Test outcome negative                          7         10          7 TP: True Positive                              3          1          3 TN: True Negative                              7          8          4 FP: False Positive                             2          1          2 FN: False Negative                             0          2          3 TPR: (Sensitivity, hit rate, recall)           1  0.3333333        0.5 TNR=SPC: (Specificity)                 0.7777778  0.8888889  0.6666667 PPV: Pos Pred Value (Precision)              0.6        0.5        0.6 NPV: Neg Pred Value                            1        0.8  0.5714286 FPR: False-out                         0.2222222  0.1111111  0.3333333 FDR: False Discovery Rate                    0.4        0.5        0.4 FNR: Miss Rate                                 0  0.6666667        0.5 ACC: Accuracy                          0.8333333       0.75  0.5833333 F1 score                                    0.75        0.4  0.5454545 MCC: Matthews correlation coefficient  0.6831301  0.2581989  0.1690309 Informedness                           0.7777778  0.2222222  0.1666667 Markedness                                   0.6        0.3  0.1714286 Prevalence                                  0.25       0.25        0.5 LR+: Positive likelihood ratio               4.5          3        1.5 LR-: Negative likelihood ratio                 0       0.75       0.75 DOR: Diagnostic odds ratio                   inf          4          2 FOR: False omission rate                       0        0.2  0.4285714 

I noticed that a new Python library about Confusion Matrix named PyCM is out: maybe you can have a look.

like image 82
scls Avatar answered Oct 06 '22 23:10

scls


Nearly a decade has passed, yet the solutions (without sklearn) to this post are convoluted and unnecessarily long. Computing a confusion matrix can be done cleanly in Python in a few lines. For example:

import numpy as np  def compute_confusion_matrix(true, pred):   '''Computes a confusion matrix using numpy for two np.arrays   true and pred.    Results are identical (and similar in computation time) to:      "from sklearn.metrics import confusion_matrix"    However, this function avoids the dependency on sklearn.'''    K = len(np.unique(true)) # Number of classes    result = np.zeros((K, K))    for i in range(len(true)):     result[true[i]][pred[i]] += 1    return result 
like image 28
cgnorthcutt Avatar answered Oct 07 '22 00:10

cgnorthcutt