Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sklearn roc_auc_score with multi_class=="ovr" should have None average available

I'm trying to compute the AUC score for a multiclass problem using the sklearn's roc_auc_score() function.

I have prediction matrix of shape [n_samples,n_classes] and a ground truth vector of shape [n_samples], named np_pred and np_label respectively.

What I'm trying to achieve is the set of AUC scores, one for each classes that I have.

To do so I would like to use the average parameter option None and multi_class parameter set to "ovr", but if I run

roc_auc_score(y_score=np_pred, y_true=np_label, multi_class="ovr",average=None)

I get back

ValueError: average must be one of ('macro', 'weighted') for multiclass problems

This error is expected from the sklearn function in the case of the multiclass; but if you take a look at the roc_auc_score function source code, you can see that if the multi_class parameter is set to "ovr", and the average is one of the accepted one, the multiClass case is treated as a multiLabel one and the internal multiLabel function accepts None as average parameter.

So, by looking at the code, it seems that I should be able to execute a multiclass with a None average in a One vs Rest case but the ifs in the source code do not allow such combination.

Am I wrong?

In case I'm wrong, from a theoretical point of view should I fake a multilabel case just to have the different AUCs for the different classes or should I write my own function that cycles the different classes and outputs the AUCs?

Thanks

like image 702
Dario Mantegazza Avatar asked Jan 09 '20 14:01

Dario Mantegazza


People also ask

What is the range of roc_auc_score?

The roc_auc_score always runs from 0 to 1, and is sorting predictive possibilities. 0.5 is the baseline for random guessing, so you want to always get above 0.5.

What is roc_auc_score in Python?

roc_auc_score is defined as the area under the ROC curve, which is the curve having False Positive Rate on the x-axis and True Positive Rate on the y-axis at all classification thresholds. But it's impossible to calculate FPR and TPR for regression methods, so we cannot take this road.

Can you calculate AUC for multiclass?

How do AUC ROC plots work for multiclass models? For multiclass problems, ROC curves can be plotted with the methodology of using one class versus the rest. Use this one-versus-rest for each class and you will have the same number of curves as classes. The AUC score can also be calculated for each class individually.

How do you interpret ROC AUC scores?

The Area Under the Curve (AUC) is the measure of the ability of a classifier to distinguish between classes and is used as a summary of the ROC curve. The higher the AUC, the better the performance of the model at distinguishing between the positive and negative classes.


1 Answers

As you already know, right now sklearn multiclass ROC AUC only handles the macro and weighted averages. But it can be implemented as it can then individually return the scores for each class.

Theoretically speaking, you could implement OVR and calculate per-class roc_auc_score, as:

roc = {label: [] for label in multi_class_series.unique()}
for label in multi_class_series.unique():
    selected_classifier.fit(train_set_dataframe, train_class == label)
    predictions_proba = selected_classifier.predict_proba(test_set_dataframe)
    roc[label] += roc_auc_score(test_class, predictions_proba[:,1])
like image 121
shaivikochar Avatar answered Sep 23 '22 21:09

shaivikochar