Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I increase precision in R when calculating with probabilities close to 0 and 1?

I have the problem that a sum of variables representing probabilities in the [0,1] interval sum to 0 but should be >0. The problem is surely related to floating point representation and precision in R, but I cannot pin down where it goes wrong.

options(digits = 22)
p1 = 0.8
p2 = 0.9999999999999998

p11 = p1 * p2
p10 = p1 - p11
p01 = p2 - p11
p00 = 1 - p11 - p10 - p01

p11, p10, p01 all are numeric. p00 is also numeric but

> p00
[1] 0

and p00 == 0 is TRUE on my machine. However it should not be zero as it can be shown that p00 is >0 mathematically.

This problem seems to be related to the fact that p01 is small. However p01>0 is TRUE still applies on my machine. Why does it go wrong when taking the final sum in p00?

Is there a numerical trick to solve this problem, i.e. get an exact representation of p00? Note again p are probabilities in [0,1]. I considered using log and exp transformations but without consistent success.

like image 961
tomka Avatar asked Mar 13 '18 18:03

tomka


2 Answers

R itself can only deal with 64 bit floating point numbers, the Rmpfr package can deal with arbitrary precision floating point numbers:

library(Rmpfr)

> p1 = mpfr("0.8", 128)
> p2 = mpfr("0.9999999999999998", 128)

> p11 = p1 * p2
> p10 = p1 - p11
> p01 = p2 - p11
> p00 = 1 - p11 - p10 - p01

> p00
1 'mpfr' number of precision  128   bits 
[1] 4.00000000000000000000000461461738779728e-17

Edit: Use stings to define mpfr

like image 54
gdkrmr Avatar answered Sep 25 '22 13:09

gdkrmr


You want to avoid catastrophic cancellation, which occurs when subtracting values of approximately the same size: essentially this amplifies any existing errors.

One trick is to do the subtraction first, i.e.

> p1 = 0.8
> p2 = 0.9999999999999998
> (1-p1)*(1-p2)
[1] 4.440892e-17

The remaining inaccuracy is due to the fact that 0.9999999999999998 isn't actually 0.9999999999999998, but instead is stored as 0.9999999999999997779553950749686919152736663818359375 (which is the closest representable floating point number)

Update: to explain a bit further what is going on.

Since most floating point operations cannot be done exactly, they incur some error: this is the difference between the result you get and the true mathematical result. It is inherently relative, rather than absolute (i.e. significant digits, not digits after the decimal place).

When thinking about error, you need to keep track of 2 things:

  1. the error from the operation
  2. the already accumulated error.

The first is not too difficult: most operations incur a small amount of relative error that is at most about 1e-16 (half a unit in the last place. There are also a couple of operations that incur no error (subtracting 2 numbers that are within 2x of each other, multilying or dividing by powers of 2).

The second is trickier, but basically want to avoid anything that would amplify existing error: the most insidious is subtracting two numbers that are roughly equal: the absolute error stays the same, but the relative error can increase dramatically. Multilication and division are inherently relative operations, so don't have any effect here.

In your original code, you are doing a lot of such subtractions with quantities that have already accumulated some error. In my modification, I do the subtractions first, so this error is minimised (though in the case of p2, you can see that the result is still drastically influenced by the error in conversion from the decimal string "0.9999999999999998" to the binary floating point number).

like image 41
Simon Byrne Avatar answered Sep 24 '22 13:09

Simon Byrne