I appear to be getting inconsistent results when I use R's p.adjust function to calculate the False Discovery Rate. Based upon the paper cited in the documentation the adjusted p value should be calculated like this: <pre class="prettyprint"><code>adjusted_p_at_index_i= p_at_index_i*(total_number_of_tests/i). </code></pre> Now when I run <code>p.adjust(c(0.0001, 0.0004, 0.0019),"fdr")</code> I get the expected results of <pre class="prettyprint"><code>c(0.0003, 0.0006, 0.0019) </code></pre> but when I run <code>p.adjust(c(0.517479039, 0.003657195, 0.006080152),"fdr")</code> I get this <pre class="prettyprint"><code>c(0.517479039, 0.009120228, 0.009120228) </code></pre> Instead of the result I calculate: <pre class="prettyprint"><code>c(0.517479039, 0.010971585, 0.009120228) </code></pre> What is R doing to the data that can account for both of these results?

The reason is that the FDR calculation ensures that FDR never increases as the p-value decreases. That's because you can always choose to set a higher threshold for your rejection rule if that higher threshold will get you a lower FDR. In your case, your second hypothesis had a p-value of <code>0.0006</code> and an FDR of <code>0.010971585</code>, but the third hypothesis had a larger p-value and a smaller FDR. If you can achieve an FDR of <code>0.009120228</code> by setting your p-value threshold to <code>0.0019</code>, there is never a reason to set a lower threshold just to get a higher FDR. You can see this in the code by typing <code>p.adjust</code>: <pre class="prettyprint"><code>... }, BH = { i <- lp:1L o <- order(p, decreasing = TRUE) ro <- order(o) pmin(1, cummin(n/i * p[o]))[ro] </code></pre> The <code>cummin</code> function takes the cumulative minimum of the vector, going backwards in the order of <code>p</code>. You can see this in the Benjamini-Hochberg paper you link to, including in the definition of the procedure on page 293, which states (emphasis mine): <blockquote> let k be the largest i for which P(i) <= i / m q*; then reject all H_(i) i = 1, 2, ..., k </blockquote>

How Does R Calculate the False Discovery Rate

Tags:

r

fdr

I appear to be getting inconsistent results when I use R's p.adjust function to calculate the False Discovery Rate. Based upon the paper cited in the documentation the adjusted p value should be calculated like this:

adjusted_p_at_index_i= p_at_index_i*(total_number_of_tests/i).

Now when I run p.adjust(c(0.0001, 0.0004, 0.0019),"fdr") I get the expected results of

c(0.0003, 0.0006, 0.0019)

but when I run p.adjust(c(0.517479039, 0.003657195, 0.006080152),"fdr") I get this

c(0.517479039, 0.009120228, 0.009120228)

Instead of the result I calculate:

c(0.517479039, 0.010971585, 0.009120228)

What is R doing to the data that can account for both of these results?

273

asked May 01 '15 18:05

JGibbRandomNumber

1 Answers

The reason is that the FDR calculation ensures that FDR never increases as the p-value decreases. That's because you can always choose to set a higher threshold for your rejection rule if that higher threshold will get you a lower FDR.

In your case, your second hypothesis had a p-value of 0.0006 and an FDR of 0.010971585, but the third hypothesis had a larger p-value and a smaller FDR. If you can achieve an FDR of 0.009120228 by setting your p-value threshold to 0.0019, there is never a reason to set a lower threshold just to get a higher FDR.

You can see this in the code by typing p.adjust:

...
}, BH = {
    i <- lp:1L
    o <- order(p, decreasing = TRUE)
    ro <- order(o)
    pmin(1, cummin(n/i * p[o]))[ro]

The cummin function takes the cumulative minimum of the vector, going backwards in the order of p.

You can see this in the Benjamini-Hochberg paper you link to, including in the definition of the procedure on page 293, which states (emphasis mine):

let k be the largest i for which P(i) <= i / m q*;

then reject all H_(i) i = 1, 2, ..., k

answered Sep 22 '22 10:09

David Robinson

Related questions
                            
                                tabsetPanel within a fluidPage not working
                            
                                Set the height of the graphs y-axis in grid.arrange, but not of the entire plot area
                            
                                Changing the radius of a coord_polar ggplot
                            
                                Split labels over 2 lines in ggplot with factors
                            
                                How to colour the labels of a dendrogram by an additional factor variable in R
                            
                                Importing R markdown to Confluence
                            
                                Plotting cumulative histogram with negative and positive side in ggplot?
                            
                                Select one row from each group in a large data.table based on a condition [duplicate]
                            
                                Variables of a data.frame beginning by a dot disappear in within()
                            
                                R - Using switch with logical values
                            
                                lattice, connect points only if the connection has a positive slope
                            
                                R data.table Size and Memory Limits
                            
                                How does R handle Unicode / UTF-8?
                            
                                How to read large (~20 GB) xml file in R?
                            
                                Using dplyr's rename() including variable names not in data set
                            
                                r knit word document plots automatically re-sized
                            
                                Plotting shape files with ggmap: clipping when shape file is larger than ggmap
                            
                                R markdown ioslides incremental mode for code chunks
                            
                                Lapply in a dataframe over different variables using filters
                            
                                Writing an R package: needing a package I don't explicitly call

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With