Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting high precision values from qnorm in the tail

The problem

I am looking for high precision values for the normal distribution in the tail (1e-10 and 1 - 1e-10), as the R package that I am using sets any number which is out of this range to these values and then calls the qnorm and qt function.

What I have noticed is that the qnorm implementation in R is not symmetric when looking at the tails. This is quite surprising to me, as it is well known that this distribution is symmetric, and I have seen implementations in other languages that are symmetric. I have checked the qt function and it is also not symmetric in the tails.

Here are the results from the qnorm function:

x       qnorm(x)                qnorm(1-x)              qnorm(1-x) + qnorm(x)
1e-2    -2.3263478740408408     2.3263478740408408      0.0 (i.e < machine epsilon)
1e-3    -3.0902323061678132     3.0902323061678132      0.0 (i.e < machine epsilon)
1e-4    -3.71901648545568       3.7190164854557084      2.8421709430404007e-14
1e-5    -4.2648907939228256     4.2648907939238399      1.014299755297543e-12
1e-10   -6.3613409024040557     6.3613408896974208      -1.2706634855419452e-08

It is quite clear that at a value of x close to 0 or 1, this function breaks down. Yes, in "normal" use this isn't a problem, but I am looking at fringe cases and multiplying small probabilities by very large values, in which case the error (1e-08) becomes a large value.

Note: I have tried this with 1-x and with entering the actual number 0.00001 and 0.99999 and the accuracy issue is still there.

The questions

Firstly, is this a known problem with the qnorm and qt implementations? I could not find anything in the documentation, the algorithm is supposed to be accurate 16 digits for p values from 10^-314as described in the Algorithm AS 241 paper.

Quote from R doc:

Wichura, M. J. (1988) Algorithm AS 241: The percentage points of the normal distribution. Applied Statistics, 37, 477–484.

which provides precise results up to about 16 digits.

If the R code implements the 7 digit version, why does it claim 16 digits? Or is it "accurate" but the original algorithm is not symmetric and wrong?

If R does implement both versions of Algorithm AS 241 can I turn the 16 digit version on?

Or, is there a more accurate version of qnorm in R? Or, another solution to my problem where I need high precision in the tails for quantile functions.

R version

>version 
platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          3                           
minor          3.2                         
year           2016                        
month          10                          
day            31                          
svn rev        71607                       
language       R                           
version.string R version 3.3.2 (2016-10-31)
nickname       Sincere Pumpkin Patch   
like image 222
Sheldon Avatar asked Apr 12 '17 07:04

Sheldon


1 Answers

It turns out (as noted by Spencer Graves in his response to this same question on the R-devel list-serve) that qnorm() does in fact perform as advertised. It's just that, to get highly accurate results in the distribution's upper tail, you'll need to avail yourself of the function's lower.tail argument.

Here's how do that:

options(digits=22)

## For values of p in [0, 0.5], specify lower tail probabilities 
qnorm(p = 1e-10)                      ## x: P(X <= x) == 1e-10
# [1] -6.3613409024040557

## For values of p in (0.5, 1], specify upper tail probabilities
qnorm(p = 1e-10, lower.tail=FALSE)    ## x: P(X > x)  == 1e-10     (correct approach)
# [1] 6.3613409024040557
qnorm(p = 1 - 1e-10)                  ## x: P(X <= x) == 1-(1e-1)  (incorrect approach)
# [1] 6.3613408896974208

The problem is that 1-1e-10 (for example) is subject to floating point rounding errors, such that it isn't really the same distance from 1 (the upper end of the interval) as 1e-10 is from 0 (the lower end of the interval). The underlying problem (it's R-FAQ 7.31!) becomes obvious when put in a more familiar guise:

1 - (1 - 1e-10) == 1e-10
## [1] FALSE

Finally, here's a quick confirmation that qnorm() provides accurate (or at least symmetrical) results out to the values claimed in its help file:

qnorm(1e-314)
## [1] -37.906647423565666
qnorm(1e-314, lower.tail=FALSE)
## [1] 37.906647423565666

## With this failing in just the way (and for just the reason) you'd now expect
qnorm(1-1e-314)
# [1] Inf
1 == (1-1e-314)
# [1] TRUE
like image 189
Josh O'Brien Avatar answered Nov 15 '22 06:11

Josh O'Brien