Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does R incorrectly perform sum here?

Tags:

r

precision

sum

(If the answer is indeed how R uses binary numbers to store small decimals, would still love to hear details on why this happens from someone with knowledge on this)

I'm using R to compute a sum, specifically this one:

enter image description here

Here's my R code:

r = 21 #Number of times the loop executes 
n = 52 
sum = 0 #Running total 
for(k in 0:r){
  sum = sum + (1-(choose(2^(k), n)*(factorial(n)/(2^(n*k)))))
  print(sum)
}

If you look at the output, you'll notice:

[1] 1
[1] 2
...
[1] 11.71419
[1] 11.71923
[1] 11.72176
[1] 12.72176
[1] 13.72176

Why does it start incrementing by 1 after the 19th iteration?

Are there other freely available computational engines that would be better suited for this task?

like image 793
zthomas.nc Avatar asked Dec 06 '15 16:12

zthomas.nc


2 Answers

All computations are using floating point numbers here, which is generally not well suited for factorials and raising powers (because such values get quickly very large which makes computation less accurate).

Please, try:

> factorial(52)/(2^(52*19))
[1] 3.083278e-230
> factorial(52)/(2^(52*20))
[1] 0

and compare with Pari-GP:

? \p 256
realprecision = 269 significant digits (256 digits displayed)
? precision( (52!) / 2^(52*19) + .  , 24)
%1 = 3.08327794368108826958435659488643289724 E-230
? precision( (52!) / 2^(52*20) + .  , 24)
%2 = 6.8462523287873017654100595727727116496 E-246
like image 173
Thomas Baruchel Avatar answered Sep 25 '22 08:09

Thomas Baruchel


If you're looking for a way to get around the overflow/underflow problem you're having, you can use logs (to keep the magnitudes reasonable for the intermediate calculations) and then exponentiate at the end:

options(digits=20)

for(k in 0:r){
  sum = sum + (1 - exp(lchoose(2^k, n) + log(factorial(n)) - k*n*log(2)))
  print(paste0(k,": ",sum))
}

[1] "0: 1"
[1] "1: 2"
[1] "2: 3"
...
[1] "19: 11.7217600143979"
[1] "20: 11.7230238079842"
[1] "21: 11.7236558993777"

To check that this is correct, I ran the original summation (without taking logs) in Mathematica and got the same results out to 12 decimal places.

Although you can do the problem in R, if you want to go with a computer algebra system (which allows you to do symbolic math and exact computations), Sage is free and open source.

like image 23
eipi10 Avatar answered Sep 24 '22 08:09

eipi10