Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

p-value from fisher.test() does not match phyper()

The Fisher's Exact Test is related to the hypergeometric distribution, and I would expect that these two commands would return identical pvalues. Can anyone explain what I'm doing wrong that they do not match?

#data (variable names chosen to match dhyper() argument names)
x = 14
m = 20
n = 41047
k = 40

#Fisher test, alternative = 'greater'
(fisher.test(matrix(c(x, m-x, k-x, n-(k-x)),2,2), alternative='greater'))$p.value 
#returns 2.01804e-39

#geometric distribution, lower.tail = F, i.e. P[X > x]
phyper(x, m, n, k, lower.tail = F, log.p = F)
#returns 5.115862e-43
like image 607
R-Peys Avatar asked Oct 29 '18 18:10

R-Peys


People also ask

How do you find the p-value on a Fisher exact test?

Condition on the marginal counts • Then Pr(table) ∝ 1/∏ij nij! Consider all possible tables with the observed marginal counts • Calculate Pr(table) for each possible table. P-value = the sum of the probabilities for all tables having a probability equal to or smaller than that observed.

What does the p-value mean in a Fisher test?

The Fisher-exact P value corresponds to the proportion of values of the test statistic that are as extreme (i.e., as unusual) or more extreme than the observed value of that test statistic.

What are the assumptions for Fisher's exact test?

Assumptions. The row and column totals are fixed, not random. Sampling or allocation are random and observations are mutually independent within the constraints of fixed marginal totals. Each observation is mutually exclusive - in other words each observation can only be classified in one cell.

Which p-value for Fisher's exact test SAS?

The p-value for Fisher's exact test is 0.3324.


1 Answers

In this case, the actual call to phyper that is relevant is phyper(x - 1, m, n, k, lower.tail = FALSE). Look at the source code for fisher.test relevant to your call of fisher.test(matrix(c(x, m-x, k-x, n-(k-x)),2,2), alternative='greater'). At line 138, PVAL is set to:

switch(alternative, less = pnhyper(x, or), 
    greater = pnhyper(x, or, upper.tail = TRUE), 
    two.sided = {
      if (or == 0) as.numeric(x == lo) else if (or == 
        Inf) as.numeric(x == hi) else {
        relErr <- 1 + 10^(-7)
        d <- dnhyper(or)
        sum(d[d <= d[x - lo + 1] * relErr])
      }
    })

Since alternative = 'greater', PVAL is set to pnhyper(x, or, upper.tail = TRUE). You can see pnhyper defined on line 122. Here, or = 1, which is passed to ncp, so the call is phyper(x - 1, m, n, k, lower.tail = FALSE)

With your values:

x = 14
m = 20
n = 41047
k = 40
phyper(x - 1, m, n, k, lower.tail = FALSE)
# [1] 2.01804e-39
like image 165
De Novo Avatar answered Sep 17 '22 12:09

De Novo