Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fix ROC curve with points below diagonal?

I am building receiver operating characteristic (ROC) curves to evaluate classifiers using the area under the curve (AUC) (more details on that at end of post). Unfortunately, points on the curve often go below the diagonal. For example, I end up with graphs that look like the one here (ROC curve in blue, identity line in grey) :

Fixing the ROC

The the third point (0.3, 0.2) goes below the diagonal. To calculate AUC I want to fix such recalcitrant points.

The standard way to do this, for point (fp, tp) on the curve, is to replace it with a point (1-fp, 1-tp), which is equivalent to swapping the predictions of the classifier. For instance, in our example, our troublesome point A (0.3, 0.2) becomes point B (0.7, 0.8), which I have indicated in red in the image linked to above.

This is about as far as my references go in treating this issue. The problem is that if you add the new point into a new ROC (and remove the bad point), you end up with a nonmonotonic ROC curve as shown (red is the new ROC curve, and dotted blue line is the old one):

New ROC

And here I am stuck. How can I fix this ROC curve?

Do I need to re-run my classifier with the data or classes somehow transformed to take into account this weird behavior? I have looked over a relevant paper, but if I am not mistaken, it seems to be addressing a slightly different problem than this.

In terms of some details: I still have all the original threshold values, fp values, and tp values (and the output of the original classifier for each data point, an output which is just a scalar from 0 to 1 that is a probability estimate of class membership). I am doing this in Matlab starting with the perfcurve function.


like image 854
eric Avatar asked Dec 09 '12 04:12

eric


People also ask

Can ROC curve be below diagonal?

The ROC curve (purple curve) could always be below the diagonal. Ie for all threshold values its performance is worse than random.

What is diagonal line in ROC curve?

The diagonal line in a ROC curve represents perfect chance. In other words, a test that follows the diagonal has no better odds of detecting something than a random flip of a coin.

Why is my ROC curve upside down?

The ROC plot is then drawn with the optimised cut-off point marked. The plot should look like a stepped curve convex to the top left hand corner, if it is upside down then you have probably selected "condition present" and "condition absent" the wrong way around.

How do you flip a ROC curve?

You can flip the ROC curve by subtracting from 1 your predicted values. ROC curve can be plotted by either using "lroc" or by first generating a variable with your predictions and then using "roctab refvar classvar, graph", where refvar is your outcome variable and classvar is your prediction.


2 Answers

Note based on some very helpful emails about this from the people that wrote the articles cited above, and the discussion above, the right answer seems to be: do not try to "fix" individual points in an ROC curve unless you build an entirely new classifier, and then be sure to leave out some test data to see if that was a reasonable thing to do.

Getting points below the identity line is something that simply happens. It's like getting an individual classifier that scores 45% correct even though the optimal theoretical minimum is 50%. That's just part of the variability with real data sets, and unless it is significantly less than expected based on chance, it isn't something you should worry too much about. E.g., if your classifier gets 20% correct, then clearly something is amiss and you might look into the specific reasons and fix your classifier.

like image 132
eric Avatar answered Oct 18 '22 08:10

eric


Yes, swapping a point for (1-fp, 1-tp) is theoretically effective, but increasing sample size is a safe bet too.

It does seem that your system has a non-monotonic response characteristic so be careful not to bend the rules of the ROC too much or you will impact the robustness of the AUC.

That said, you could try to use a Pareto Frontier Curve (Pareto Front). If that fits the requirements of "Repairing Concavities" then you'll basically sort the points so that the ROC curve becomes monotonic.

like image 27
Jake Hertenstein Avatar answered Oct 18 '22 06:10

Jake Hertenstein