Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the threshold for the sklearn roc_auc_score

In my classification problem, I want to check whether my model has performed good, so i did a roc_auc_score to find the accuracy and got the value 0.9856825361839688

my question

this is my code

x,y=make_classification(n_samples=2000,n_classes=2,weights=[1,1],random_state=24)
x_train, x_test, y_train, y_test=train_test_split(x,y,test_size=0.3,random_state=43)


from sklearn.neighbors import KNeighborsClassifier
knn_classifier=KNeighborsClassifier()
knn_classifier.fit(x_train, y_train)
ytrain_pred = knn_classifier.predict_proba(x_train)
print('train roc-auc: {}'.format(roc_auc_score(y_train, ytrain_pred[:,1])))

train roc-auc: 0.9856825361839688

now i do a roc-auc plot to check the best score

fpr_1, tpr_1, thresholds_1=roc_curve(y_train, ytrain_pred[:,1])
fig,ax=plt.subplots(1,1,figsize=(15,7))
g=sns.lineplot(x=fpr_1,y=tpr_1,ax=ax,color='green')
g.set_xlabel('False Positive Rate')
g.set_ylabel('True Positive Rate')
g.set(xlim=(0,0.8))

enter image description here

From the plot i can visually see that TPR is at the maximum starting from the 0.2(FPR), so from the roc_auc_score which i got , should i think that the method took 0.2 as the threshold

I explicitly calculated the accuracy score for each threshold

_result=pd.concat([pd.Series(thresholds_1),pd.Series(accuracy_ls)],axis=1)
_result.columns=['threshold','accuracy score']

enter image description here

so, should i think that the roc_auc_score gives the highest score no matter what is the threshold is?

like image 489
Lijin Durairaj Avatar asked Nov 24 '25 11:11

Lijin Durairaj


1 Answers

The method roc_auc_score is used for evaluation of the classifier. It tells you the area under the roc curve. (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html)

roc_auc_score == 1 - ideal classifier.

For binary classification with an equal number of samples for both classes in the evaluated dataset: roc_auc_score == 0.5 - random classifier.

In this method we don't compare thresholds between each other.

Which threshold is better, you should decide yourself, depending on the business problem you are trying to solve. What is more important for you precision or recall?

like image 82
Danylo Baibak Avatar answered Nov 26 '25 01:11

Danylo Baibak



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!