Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scikit multi-class classification metrics, classification report

I am using scikit learn 0.15.2 for a multi-class classification problem. I was getting a lot of DeprecationWarnings as follows when following examples like: scikit 0.14 multi label metrics until I started to use the MultiLabelBinarizer:

"DeprecationWarning: Direct support for sequence of sequences multilabel representation will be unavailable from version 0.17. Use sklearn.preprocessing.MultiLabelBinarizer to convert to a label indicator representation."

However, I cannot find a way to get the classification report (with precision, recall, f-measure) to work with it, as i was previously possible as shown here: scikit 0.14 multi label metrics

I tried to use inverse_transform as below, this gives a classification_report but also gives the warnings again, that from 0.17 this code will break.

How can I get measures for a multi-class classification problem?

Example code:

import numpy as np
from sklearn.multiclass import OneVsRestClassifier
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.svm import LinearSVC
from sklearn.metrics import classification_report

# Some simple data:

X_train = np.array([[0,0,0], [0,0,1], [0,1,0], [1,0,0], [1,1,1]])
y_train = [[1], [1], [1,2], [2], [2]]

# Use MultiLabelBinarizer and train a multi-class classifier:

mlb = MultiLabelBinarizer(sparse_output=True)
y_train_mlb = mlb.fit_transform(y_train)

clf = OneVsRestClassifier(LinearSVC())
clf.fit(X_train, y_train_mlb)

# classification_report, here I did not find a way to use y_train_mlb, 
# I am getting a lot of DeprecationWarnings

predictions_test = mlb.inverse_transform(clf.predict(X_train))
print classification_report(y_train, predictions_test)

# Predict new example:

print mlb.inverse_transform(clf.predict(np.array([0,1,0])))
like image 650
tkja Avatar asked May 14 '15 22:05

tkja


People also ask

What is sklearn metrics classification report?

A Classification report is used to measure the quality of predictions from a classification algorithm. How many predictions are True and how many are False. More specifically, True Positives, False Positives, True negatives and False Negatives are used to predict the metrics of a classification report as shown below.

How do you find the accuracy of a multi class classification in Python?

To calculate accuracy, use the following formula: (TP+TN)/(TP+TN+FP+FN). Misclassification Rate: It tells you what fraction of predictions were incorrect. It is also known as Classification Error. You can calculate it using (FP+FN)/(TP+TN+FP+FN) or (1-Accuracy).

What is accuracy_score in sklearn?

Accuracy classification score. In multilabel classification, this function computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true. Read more in the User Guide.


1 Answers

It seems like you have to run your classification report with the binarized labels:

print classification_report(y_train_mlb, clf.predict(X_train))
like image 91
elachell Avatar answered Oct 21 '22 20:10

elachell