I'm trying to calculate p-values of a f-statistic with R. The formula R uses in the lm() function is equal to (e.g. assume x=100, df1=2, df2=40):
pf(100, 2, 40, lower.tail=F)
[1] 2.735111e-16
which should be equal to
1-pf(100, 2, 40)
[1] 2.220446e-16
It is not the same! There s no BIG difference, but where does it come from? If I calculate (x=5, df1=2, df2=40):
pf(5, 2, 40, lower.tail=F)
[1] 0.01152922
1-pf(5, 2, 40)
[1] 0.01152922
it is exactly the same. Question is...what is happening here? Have I missed something?
The F statistic has two degrees of freedom, one for the numerator and one for the denominator and the F distribution is a right-tailed distribution. Therefore, we need to use the F-statistic, the degrees of freedoms, and the lower. tail=FALSE argument with pf function to find the p-value for a F statistic.
We can calculate P-values in R by using cumulative distribution functions and inverse cumulative distribution functions (quantile function) of the known sampling distribution.
R function df(x, df1, df2) is the probability of F equalling x when the degrees of freedom are df1 and df2 . R function pf(q, df1, df2, lower. tail) is the cumulative probability ( lower. tail = TRUE for left tail, lower.
> all.equal(pf(100, 2, 40, lower.tail=F),1-pf(100, 2, 40))
[1] TRUE
As the comments note, this is a floating point precision issue. In fact both of the examples you show are not precisely equal as evaluated:
> pf(5, 2, 40, lower.tail=F) - (1-pf(5, 2, 40))
[1] 6.245005e-17
> pf(100, 2, 40, lower.tail=F) - (1-pf(500, 2, 40))
[1] 2.735111e-16
It's just that this difference is only apparent in your output for the much smaller number.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With