Scikit-learn confusion matrix

Tags:

I can't figure out if I've setup my binary classification problem correctly. I labeled the positive class 1 and the negative 0. However It is my understanding that by default scikit-learn uses class 0 as the positive class in its confusion matrix (so the inverse of how I set it up). This is confusing to me. Is the top row, in scikit-learn's default setting, the positive or negative class? Lets assume the confusion matrix output:

confusion_matrix(y_test, preds)  [ [30  5]     [2 42] ]

How would it look like in a confusion matrix? Are the actual instances the rows or the columns in scikit-learn?

          prediction                        prediction            0       1                          1       0          -----   -----                      -----   -----       0 | TN   |  FP        (OR)         1 |  TP  |  FP actual   -----   -----             actual   -----   -----       1 | FN   |  TP                     0 |  FN  |  TN

250

asked Feb 03 '16 13:02

OAK

2 Answers

scikit learn sorts labels in ascending order, thus 0's are first column/row and 1's are the second one

>>> from sklearn.metrics import confusion_matrix as cm >>> y_test = [1, 0, 0] >>> y_pred = [1, 0, 0] >>> cm(y_test, y_pred) array([[2, 0],        [0, 1]]) >>> y_pred = [4, 0, 0] >>> y_test = [4, 0, 0] >>> cm(y_test, y_pred) array([[2, 0],        [0, 1]]) >>> y_test = [-2, 0, 0] >>> y_pred = [-2, 0, 0] >>> cm(y_test, y_pred) array([[1, 0],        [0, 2]]) >>>

This is written in the docs:

labels : array, shape = [n_classes], optional List of labels to index the matrix. This may be used to reorder or select a subset of labels. If none is given, those that appear at least once in y_true or y_pred are used in sorted order.

Thus you can alter this behavior by providing labels to confusion_matrix call

>>> y_test = [1, 0, 0] >>> y_pred = [1, 0, 0] >>> cm(y_test, y_pred) array([[2, 0],        [0, 1]]) >>> cm(y_test, y_pred, labels=[1, 0]) array([[1, 0],        [0, 2]])

And actual/predicted are oredered just like in your images - predictions are in columns and actual values in rows

>>> y_test = [5, 5, 5, 0, 0, 0] >>> y_pred = [5, 0, 0, 0, 0, 0] >>> cm(y_test, y_pred) array([[3, 0],        [2, 1]])

true: 0, predicted: 0 (value: 3, position [0, 0])
true: 5, predicted: 0 (value: 2, position [1, 0])
true: 0, predicted: 5 (value: 0, position [0, 1])
true: 5, predicted: 5 (value: 1, position [1, 1])

137

answered Sep 20 '22 23:09

lejlot

Supporting Answer:

When drawing the confusion matrix values using sklearn.metrics, be aware that the order of the values are

[ True Negative False positive] [ False Negative True Positive ]

If you interpret the values wrong, say TP for TN, your accuracies and AUC_ROC will more or less match, but your precision, recall, sensitivity, and f1-score will take a hit and you will end up with completely different metrics. This will result in you making a false judgement of your model's performance.

Do make sure to clearly identify what the 1 and 0 in your model represent. This heavily dictates the results of the confusion matrix.

Experience:

I was working on predicting fraud (binary supervised classification), where fraud was denoted by 1 and non-fraud by 0. My model was trained on a scaled up, perfectly balanced data set, hence during in-time testing, values of confusion matrix did not seem suspicious when my results were of the order [TP FP] [FN TN]

Later, when I had to perform an out-of-time test on a new imbalanced test set, I realized that the above order of confusion matrix was wrong and different from the one mentioned on sklearn's documentation page which refers to the order as tn,fp,fn,tp. Plugging in the new order made me realize the blunder and what a difference it had caused in my judgement of the model's performance.

answered Sep 19 '22 23:09

Vinad

Related questions
                            
                                Get full package module name
                            
                                How to recursively go through all subdirectories and read files?
                            
                                How to generate temporary file in django and then destroy
                            
                                python argparse - either both optional arguments or else neither one
                            
                                Why does Celery NOT throw an Exception when the underlying task throws one
                            
                                Using a variable in a try,catch,finally statement without declaring it outside
                            
                                Pickle dump huge file without memory error
                            
                                how to play wav file in python?
                            
                                Signal handling in multi-threaded Python
                            
                                Python numpy: cannot convert datetime64[ns] to datetime64[D] (to use with Numba)
                            
                                ImportError: with error 'is not a package'
                            
                                Why is repr(int) faster than str(int)?
                            
                                Pandas pd.Series.isin performance with set versus array
                            
                                Pasting multiple lines into IDLE
                            
                                Implementing a callback in Python - passing a callable reference to the current function
                            
                                summarize text or simplify text [closed]
                            
                                Best practice for reusing python code [closed]
                            
                                Is it okay to set instance variables in a Django class based view?
                            
                                Printing with indentation in python
                            
                                Trying to write a cPickle object but get a 'write' attribute type error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scikit-learn confusion matrix

Tags:

python

machine-learning

classification

scikit-learn

OAK

People also ask

2 Answers

lejlot

Vinad

Recent Activity

Donate For Us