I appear to be getting inconsistent results when I use R's p.adjust function to calculate the False Discovery Rate. Based upon the paper cited in the documentation the adjusted p value should be calculated like this:
adjusted_p_at_index_i= p_at_index_i*(total_number_of_tests/i).
Now when I run p.adjust(c(0.0001, 0.0004, 0.0019),"fdr")
I get the expected results of
c(0.0003, 0.0006, 0.0019)
but when I run p.adjust(c(0.517479039, 0.003657195, 0.006080152),"fdr")
I get this
c(0.517479039, 0.009120228, 0.009120228)
Instead of the result I calculate:
c(0.517479039, 0.010971585, 0.009120228)
What is R doing to the data that can account for both of these results?
Simply put, FDR = FP / (FP + TP). FDR-controlling procedures provide less stringent control of Type I errors compared to familywise error rate (FWER) controlling procedures (such as the Bonferroni correction), which control the probability of at least one Type I error.
An FDR-adjusted p-value (also called q-value) of 0.05 indicates that 5% of significant tests will result in false positives. In other words, an FDR of 5% means that, among all results called significant, only 5% of these are truly null.
The false discovery rate is the complement of the positive predictive value (PPV) which is the probability that, when you get a 'significant' result there is actually a real effect. So, for example, if the false discovery rate is 70%, the PPV is 30%.
The FPR is telling you the proportion of all the people who do not have the disease who will be identified as having the disease. The FDR is telling you the proportion of all the people identified as having the disease who do not have the disease. Both are therefore useful, distinct measures of failure.
The reason is that the FDR calculation ensures that FDR never increases as the p-value decreases. That's because you can always choose to set a higher threshold for your rejection rule if that higher threshold will get you a lower FDR.
In your case, your second hypothesis had a p-value of 0.0006
and an FDR of 0.010971585
, but the third hypothesis had a larger p-value and a smaller FDR. If you can achieve an FDR of 0.009120228
by setting your p-value threshold to 0.0019
, there is never a reason to set a lower threshold just to get a higher FDR.
You can see this in the code by typing p.adjust
:
...
}, BH = {
i <- lp:1L
o <- order(p, decreasing = TRUE)
ro <- order(o)
pmin(1, cummin(n/i * p[o]))[ro]
The cummin
function takes the cumulative minimum of the vector, going backwards in the order of p
.
You can see this in the Benjamini-Hochberg paper you link to, including in the definition of the procedure on page 293, which states (emphasis mine):
let k be the largest i for which P(i) <= i / m q*;
then reject all H_(i) i = 1, 2, ..., k
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With