Scikit-learn : roc_auc_score

Question

I am using the roc_auc_score function from scikit-learn to evaluate my model performances. Howver, I get differents values whether I use predict() or predict_proba()

p_pred = forest.predict_proba(x_test)
y_test_predicted= forest.predict(x_test)
fpr, tpr, _ = roc_curve(y_test, p_pred[:, 1])
roc_auc = auc(fpr, tpr)

roc_auc_score(y_test,y_test_predicted) # = 0.68
roc_auc_score(y_test, p_pred[:, 1])    # = 0.93

Could advise on that please ?

Thanks in advance

AN6U5 · Accepted Answer

First look at the difference between predict and predict_proba. The former predicts the class for the feature set where as the latter predicts the probabilities of various classes.

You are seeing the effect of rounding error that is implicit in the binary format of y_test_predicted. y_test_predicted is comprised of 1's and 0's where as p_pred is comprised of floating point values between 0 and 1. The roc_auc_score routine varies the threshold value and generates the true positive rate and false positive rate, so the score looks quite different.

Consider the case where:

y_test           = [ 1, 0, 0, 1, 0, 1, 1]
p_pred           = [.6,.4,.6,.9,.2,.7,.4]
y_test_predicted = [ 1, 0, 1, 1, 0, 1, 0]

Note that the ROC curve is generated by considering all cutoff thresholds. Now consider a threshold of 0.65...

The p_pred case gives:

TPR=0.5, FPR=0,

and the y_test_predicted case gives:

TPR=.75 FPR=.25.

You can probably see that if these two points are different, then the area under the two curves will be quite different too.

But to really understand it, I suggest looking at the ROC curves themselves to help understand this difference.

Hope this helps!

Scikit-learn : roc_auc_score

Tags:

python

machine-learning

scikit-learn

auc

user4640449

1 Answers

AN6U5

Recent Activity

Donate For Us

Scikit-learn : roc_auc_score

Tags:

python

machine-learning

scikit-learn

auc

user4640449

1 Answers

AN6U5

Related questions

Recent Activity

Donate For Us