I am looking for high precision values for the normal distribution in the tail (1e-10 and 1 - 1e-10)
, as the R package that I am using sets any number which is out of this range to these values and then calls the qnorm
and qt
function.
What I have noticed is that the qnorm
implementation in R is not symmetric when looking at the tails. This is quite surprising to me, as it is well known that this distribution is symmetric, and I have seen implementations in other languages that are symmetric. I have checked the qt
function and it is also not symmetric in the tails.
Here are the results from the qnorm function:
x qnorm(x) qnorm(1-x) qnorm(1-x) + qnorm(x)
1e-2 -2.3263478740408408 2.3263478740408408 0.0 (i.e < machine epsilon)
1e-3 -3.0902323061678132 3.0902323061678132 0.0 (i.e < machine epsilon)
1e-4 -3.71901648545568 3.7190164854557084 2.8421709430404007e-14
1e-5 -4.2648907939228256 4.2648907939238399 1.014299755297543e-12
1e-10 -6.3613409024040557 6.3613408896974208 -1.2706634855419452e-08
It is quite clear that at a value of x
close to 0 or 1, this function breaks down. Yes, in "normal" use this isn't a problem, but I am looking at fringe cases and multiplying small probabilities by very large values, in which case the error (1e-08)
becomes a large value.
Note: I have tried this with 1-x
and with entering the actual number 0.00001
and 0.99999
and the accuracy issue is still there.
Firstly, is this a known problem with the qnorm
and qt
implementations? I could not find anything in the documentation, the algorithm is supposed to be accurate 16 digits for p values from 10^-314
as described in the Algorithm AS 241 paper.
Quote from R doc:
Wichura, M. J. (1988) Algorithm AS 241: The percentage points of the normal distribution. Applied Statistics, 37, 477–484.
which provides precise results up to about 16 digits.
If the R code implements the 7 digit version, why does it claim 16 digits? Or is it "accurate" but the original algorithm is not symmetric and wrong?
If R does implement both versions of Algorithm AS 241 can I turn the 16 digit version on?
Or, is there a more accurate version of qnorm
in R?
Or, another solution to my problem where I need high precision in the tails for quantile functions.
>version
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 3.2
year 2016
month 10
day 31
svn rev 71607
language R
version.string R version 3.3.2 (2016-10-31)
nickname Sincere Pumpkin Patch
It turns out (as noted by Spencer Graves in his response to this same question on the R-devel list-serve) that qnorm()
does in fact perform as advertised. It's just that, to get highly accurate results in the distribution's upper tail, you'll need to avail yourself of the function's lower.tail
argument.
Here's how do that:
options(digits=22)
## For values of p in [0, 0.5], specify lower tail probabilities
qnorm(p = 1e-10) ## x: P(X <= x) == 1e-10
# [1] -6.3613409024040557
## For values of p in (0.5, 1], specify upper tail probabilities
qnorm(p = 1e-10, lower.tail=FALSE) ## x: P(X > x) == 1e-10 (correct approach)
# [1] 6.3613409024040557
qnorm(p = 1 - 1e-10) ## x: P(X <= x) == 1-(1e-1) (incorrect approach)
# [1] 6.3613408896974208
The problem is that 1-1e-10
(for example) is subject to floating point rounding errors, such that it isn't really the same distance from 1
(the upper end of the interval) as 1e-10
is from 0
(the lower end of the interval). The underlying problem (it's R-FAQ 7.31!) becomes obvious when put in a more familiar guise:
1 - (1 - 1e-10) == 1e-10
## [1] FALSE
Finally, here's a quick confirmation that qnorm()
provides accurate (or at least symmetrical) results out to the values claimed in its help file:
qnorm(1e-314)
## [1] -37.906647423565666
qnorm(1e-314, lower.tail=FALSE)
## [1] 37.906647423565666
## With this failing in just the way (and for just the reason) you'd now expect
qnorm(1-1e-314)
# [1] Inf
1 == (1-1e-314)
# [1] TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With