The Fisher's Exact Test is related to the hypergeometric distribution, and I would expect that these two commands would return identical pvalues. Can anyone explain what I'm doing wrong that they do not match?
#data (variable names chosen to match dhyper() argument names)
x = 14
m = 20
n = 41047
k = 40
#Fisher test, alternative = 'greater'
(fisher.test(matrix(c(x, m-x, k-x, n-(k-x)),2,2), alternative='greater'))$p.value
#returns 2.01804e-39
#geometric distribution, lower.tail = F, i.e. P[X > x]
phyper(x, m, n, k, lower.tail = F, log.p = F)
#returns 5.115862e-43
Condition on the marginal counts • Then Pr(table) ∝ 1/∏ij nij! Consider all possible tables with the observed marginal counts • Calculate Pr(table) for each possible table. P-value = the sum of the probabilities for all tables having a probability equal to or smaller than that observed.
The Fisher-exact P value corresponds to the proportion of values of the test statistic that are as extreme (i.e., as unusual) or more extreme than the observed value of that test statistic.
Assumptions. The row and column totals are fixed, not random. Sampling or allocation are random and observations are mutually independent within the constraints of fixed marginal totals. Each observation is mutually exclusive - in other words each observation can only be classified in one cell.
The p-value for Fisher's exact test is 0.3324.
In this case, the actual call to phyper
that is relevant is phyper(x - 1, m, n, k, lower.tail = FALSE)
. Look at the source code for fisher.test
relevant to your call of fisher.test(matrix(c(x, m-x, k-x, n-(k-x)),2,2), alternative='greater')
. At line 138, PVAL
is set to:
switch(alternative, less = pnhyper(x, or),
greater = pnhyper(x, or, upper.tail = TRUE),
two.sided = {
if (or == 0) as.numeric(x == lo) else if (or ==
Inf) as.numeric(x == hi) else {
relErr <- 1 + 10^(-7)
d <- dnhyper(or)
sum(d[d <= d[x - lo + 1] * relErr])
}
})
Since alternative = 'greater'
, PVAL
is set to pnhyper(x, or, upper.tail = TRUE)
. You can see pnhyper
defined on line 122. Here, or = 1
, which is passed to ncp
, so the call is phyper(x - 1, m, n, k, lower.tail = FALSE)
With your values:
x = 14
m = 20
n = 41047
k = 40
phyper(x - 1, m, n, k, lower.tail = FALSE)
# [1] 2.01804e-39
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With