Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

roc_auc_score - Only one class present in y_true

Tags:

I am doing a k-fold XV on an existing dataframe, and I need to get the AUC score. The problem is - sometimes the test data only contains 0s, and not 1s!

I tried using this example, but with different numbers:

import numpy as np
from sklearn.metrics import roc_auc_score
y_true = np.array([0, 0, 0, 0])
y_scores = np.array([1, 0, 0, 0])
roc_auc_score(y_true, y_scores)

And I get this exception:

ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.

Is there any workaround that can make it work in such cases?

like image 860
bloop Avatar asked Jul 17 '17 08:07

bloop


People also ask

What is the range of Roc_auc_score?

The roc_auc_score always runs from 0 to 1, and is sorting predictive possibilities. 0.5 is the baseline for random guessing, so you want to always get above 0.5.

What does Roc_auc_score return?

The AUC for the ROC can be calculated using the roc_auc_score() function. Like the roc_curve() function, the AUC function takes both the true outcomes (0,1) from the test set and the predicted probabilities for the 1 class. It returns the AUC score between 0.0 and 1.0 for no skill and perfect skill respectively.

What is AUC value?

The Area Under the Curve (AUC) is the measure of the ability of a classifier to distinguish between classes and is used as a summary of the ROC curve. The higher the AUC, the better the performance of the model at distinguishing between the positive and negative classes.

What is ROC machine learning?

The ROC is also known as a relative operating characteristic curve, as it is a comparison of two operating characteristics, the True Positive Rate and the False Positive Rate, as the criterion changes. An ideal classifier will have a ROC where the graph would hit a true positive rate of 100% with zero false positives.


2 Answers

You could use try-except to prevent the error:

import numpy as np
from sklearn.metrics import roc_auc_score
y_true = np.array([0, 0, 0, 0])
y_scores = np.array([1, 0, 0, 0])
try:
    roc_auc_score(y_true, y_scores)
except ValueError:
    pass

Now you can also set the roc_auc_score to be zero if there is only one class present. However, I wouldn't do this. I guess your test data is highly unbalanced. I would suggest to use stratified K-fold instead so that you at least have both classes present.

like image 172
Dat Tran Avatar answered Sep 18 '22 12:09

Dat Tran


As the error notes, if a class is not present in the ground truth of a batch,

ROC AUC score is not defined in that case.

I'm against either throwing an exception (about what? This is the expected behaviour) or returning another metric (e.g. accuracy). The metric is not broken per se.

I don't feel like solving a data imbalance "issue" with a metric "fix". It would probably be better to use another sampling, if possibile, or just join multiple batches that satisfy the class population requirement.

like image 30
Diego Ferri Avatar answered Sep 22 '22 12:09

Diego Ferri