Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How Does R Calculate the False Discovery Rate

Tags:

r

fdr

I appear to be getting inconsistent results when I use R's p.adjust function to calculate the False Discovery Rate. Based upon the paper cited in the documentation the adjusted p value should be calculated like this:

adjusted_p_at_index_i= p_at_index_i*(total_number_of_tests/i).

Now when I run p.adjust(c(0.0001, 0.0004, 0.0019),"fdr") I get the expected results of

c(0.0003, 0.0006, 0.0019)

but when I run p.adjust(c(0.517479039, 0.003657195, 0.006080152),"fdr") I get this

c(0.517479039, 0.009120228, 0.009120228)

Instead of the result I calculate:

c(0.517479039, 0.010971585, 0.009120228)

What is R doing to the data that can account for both of these results?

like image 273
JGibbRandomNumber Avatar asked May 01 '15 18:05

JGibbRandomNumber


People also ask

How is false discovery rate calculated?

Simply put, FDR = FP / (FP + TP). FDR-controlling procedures provide less stringent control of Type I errors compared to familywise error rate (FWER) controlling procedures (such as the Bonferroni correction), which control the probability of at least one Type I error.

What does FDR 0.05 mean?

An FDR-adjusted p-value (also called q-value) of 0.05 indicates that 5% of significant tests will result in false positives. In other words, an FDR of 5% means that, among all results called significant, only 5% of these are truly null.

Is false discovery rate the same as p-value?

The false discovery rate is the complement of the positive predictive value (PPV) which is the probability that, when you get a 'significant' result there is actually a real effect. So, for example, if the false discovery rate is 70%, the PPV is 30%.

Is false discovery rate the same as false positive rate?

The FPR is telling you the proportion of all the people who do not have the disease who will be identified as having the disease. The FDR is telling you the proportion of all the people identified as having the disease who do not have the disease. Both are therefore useful, distinct measures of failure.


1 Answers

The reason is that the FDR calculation ensures that FDR never increases as the p-value decreases. That's because you can always choose to set a higher threshold for your rejection rule if that higher threshold will get you a lower FDR.

In your case, your second hypothesis had a p-value of 0.0006 and an FDR of 0.010971585, but the third hypothesis had a larger p-value and a smaller FDR. If you can achieve an FDR of 0.009120228 by setting your p-value threshold to 0.0019, there is never a reason to set a lower threshold just to get a higher FDR.

You can see this in the code by typing p.adjust:

...
}, BH = {
    i <- lp:1L
    o <- order(p, decreasing = TRUE)
    ro <- order(o)
    pmin(1, cummin(n/i * p[o]))[ro]

The cummin function takes the cumulative minimum of the vector, going backwards in the order of p.

You can see this in the Benjamini-Hochberg paper you link to, including in the definition of the procedure on page 293, which states (emphasis mine):

let k be the largest i for which P(i) <= i / m q*;

then reject all H_(i) i = 1, 2, ..., k

like image 57
David Robinson Avatar answered Sep 22 '22 10:09

David Robinson