Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should we plot the roc curve for each class?

I'm doing a binary classification .. I've an imbalanced data and I've used the svm weight in trying to mitigate the situation ... As you can see I've calculated and plot the roc curve for each class and I've got the following plot: enter image description here It looks like the two classes some up to one .. and I'm n't sure if I'm doing the right thing or not because its the first time for me to draw my own roc curve ... I'm using Scikit learn to plot ... is it right to plot each class alone .. and is the classifier failing in classifying the blue class ?

this is the code that I've used to get the plot:

y_pred = clf.predict_proba(X_test)[:,0] # for calculating the probability of the first class
y_pred2 = clf.predict_proba(X_test)[:,1] # for calculating the probability of the second class
fpr, tpr, thresholds = metrics.roc_curve(y_test, y_pred)
auc=metrics.auc(fpr, tpr)
print "auc for the first class",auc

fpr2, tpr2, thresholds2 = metrics.roc_curve(y_test, y_pred2)
auc2=metrics.auc(fpr2, tpr2)
print "auc for the second class",auc2

# ploting the roc curve
plt.plot(fpr,tpr)
plt.plot(fpr2,tpr2)

plt.xlim([0.0,1.0])
plt.ylim([0.0,1.0])
plt.title('Roc curve')
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.legend(loc="lower right")
plt.show()

I know there is a better way to write as a dictionary for example but I was just trying to see the curve first

like image 785
Ophilia Avatar asked Oct 18 '22 17:10

Ophilia


1 Answers

See the Wikipedia entry for a all your ROC curve needs :)

predict_proba returns class probabilities for each class. The first column contains the probability of the first class and the second column contains the probability of the second class. Note that the two curves are rotated versions of each other. That is because the class probabilities add up to 1.

The documentation of roc_curve states that the second parameter must contain

Target scores, can either be probability estimates of the positive class or confidence values.

This means you have to pass the probabilities that corresponds to class 1. Most likely this is the second column.

You get the blue curve because you passed the probabilities of the wrong class (first column). Only the green curve is correct.

It does not make sense to compute ROC curves for each class, because the ROC curve describes the ability of the classifier to distinguish two classes. You have only one curve per classifier.

The specific problem is a coding mistake.

predict_proba returns class probabilities (1 if it's certainly the class, 0 if it is definitly not the class, usually it's something in-between).

metrics.roc_curve(y_test, y_pred) now compares class labels against probabilities, which is like comparing pears against apple juice.

You should use predict instead of predict_proba to predict class labels and not probabilities. These can be compared against the true class labels for computing the ROC curve. Incidentally, this also removes the option to plot a second curve - you only get one curve for the classifier, not one for each class.

like image 181
MB-F Avatar answered Oct 21 '22 08:10

MB-F