Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it OK if the false positive rate in a ROC curve does not end in 1.0?

I have the following ROC Curve:

ROC Curve

And it does not end in 1.0 because my predictions include zeros, for example

prediction = [0.9, 0.1, 0.8, 0.0]

For the ROC Curve, I take the top-k predictions, first {0.9}, then {0.9, 0.8} etc. And if there are no values > 0 in the prediction anymore, the prediction does not change anymore with increasing k.

So I can´t get a true negative value of zero, and since the false positive rate is fp/(fp+tn), the curve ends before it reaches 1.

Now, should I artificially use the zeros for predictions as well, or is it OK if the curve just ends like that? It feels wrong to use the zeros as well. Or am I missing something?

like image 795
Puckl Avatar asked Aug 31 '12 14:08

Puckl


People also ask

What is the ROC curve in statistics?

The ROC curve is a plot of True Positive Rate (TPR) on the y-axis vs False Positive Rate (FPR) on the x-axis. It is better to understand ROC Curve in their original form, TPR Vs FPR.

What threshold should I choose for my ROC curve?

However, we could really choose any threshold between 0 and 1 (0.1, 0.3, 0.6, 0.99, etc.) — and ROC curves help us visualize how these choices affect classifier performance. The true positive rate, or sensitivity, can be represented as: where TP is the number of true positives and FN is the number of false negatives.

Why is the true positive rate different from the false positive rate?

This ensures that the class distribution in the data (prevalence) does not affect the curve. The YESs and the NOs are being looked at separately. The True Positive Rate looks at the actual YESs and the False Positive Rate looks at the actual NOs.

What is the difference between TPR and FPR in ROC curve?

The TPR at both the thresholds is ~0.6, but FPR is higher at threshold 0.4. It’s clear that if we are happy with TPR = 0.6 we should choose threshold = 0.5. c) Purpose 3 — Comparing two models (using Area Under the Curve) In an ROC Curve, the diagonal represents the baseline model/random classifier.


1 Answers

The ROC curve shows the possible tradeoffs between false positives and false negatives when setting the threshold at different values. On one extreme, you can set the threshold so low that you label everything as positive, giving you a false negative rate of 0 and a false positive rate of 1. On the other extreme, you can set the threshold so high that you label everything as negative, giving you a false negative rate of 1 and a false positive rate of 0.

While these degenerate cases are not useful in practice, they are still theoretically valid tradeoffs and are a normal part of the ROC curve.

like image 130
Antimony Avatar answered Sep 21 '22 23:09

Antimony