Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How compute confusion matrix for multiclass classification in Scikit?

I have a multiclass classification task. When I run my script based on the scikit example as the follows:

classifier = OneVsRestClassifier(GradientBoostingClassifier(n_estimators=70, max_depth=3, learning_rate=.02))

y_pred = classifier.fit(X_train, y_train).predict(X_test)
cnf_matrix = confusion_matrix(y_test, y_pred)

I get this error:

File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 242, in confusion_matrix
    raise ValueError("%s is not supported" % y_type)
ValueError: multilabel-indicator is not supported

I tried to pass the labels=classifier.classes_ to confusion_matrix(), but it doesn't help.

y_test and y_pred are as the follow:

y_test =
array([[0, 0, 0, 1, 0, 0],
   [0, 0, 0, 0, 1, 0],
   [0, 1, 0, 0, 0, 0],
   ..., 
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 1, 0, 0],
   [0, 0, 0, 0, 1, 0]])


y_pred = 
array([[0, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 0],
   ..., 
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 0, 0, 0]])
like image 628
YNR Avatar asked Apr 27 '17 18:04

YNR


People also ask

How do you calculate multiclass of a confusion matrix?

Confusion Matrix gives a comparison between Actual and predicted values. The confusion matrix is a N x N matrix, where N is the number of classes or outputs. For 2 class ,we get 2 x 2 confusion matrix. For 3 class ,we get 3 X 3 confusion matrix.

How do you calculate accuracy from confusion matrix for multiclass?

Accuracy is one of the most popular metrics in multi-class classification and it is directly computed from the confusion matrix. The formula of the Accuracy considers the sum of True Positive and True Negative elements at the numerator and the sum of all the entries of the confusion matrix at the denominator.

What is Confusion_matrix in Scikit learn?

A confusion matrix is a tabular summary of the number of correct and incorrect predictions made by a classifier. It can be used to evaluate the performance of a classification model through the calculation of performance metrics like accuracy, precision, recall, and F1-score.


2 Answers

This worked for me:

y_test_non_category = [ np.argmax(t) for t in y_test ]
y_predict_non_category = [ np.argmax(t) for t in y_predict ]

from sklearn.metrics import confusion_matrix
conf_mat = confusion_matrix(y_test_non_category, y_predict_non_category)

where y_test and y_predict are categorical variables like one-hot vectors.

like image 135
Azhar Khan Avatar answered Sep 30 '22 07:09

Azhar Khan


First you need to create the label output array. Lets say you have 3 classes: 'cat', 'dog', 'house' indexed: 0,1,2 . And the prediction for 2 samples is: 'dog', 'house'. Your output will be:

y_pred = [[0, 1, 0],[0, 0, 1]]

run y_pred.argmax(1) to get: [1,2] This array stands for the original label indexes, meaning: ['dog', 'house']

num_classes = 3

# from lable to categorial
y_prediction = np.array([1,2]) 
y_categorial = np_utils.to_categorical(y_prediction, num_classes)

# from categorial to lable indexing
y_pred = y_categorial.argmax(1)
like image 30
Naomi Fridman Avatar answered Sep 30 '22 07:09

Naomi Fridman