Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Computing Errors R, combinations with large numbers creating wrong numbers only some off the time

I have created a combination formula in R to compute combinations of large numbers.

  combination_1 <- function(n,r){
  n_0 <- n
  num <- 1
  denom <- factorial(r)
  for(i in 1:(r)){
    num <- num * n_0
    n_0 <- n_0-1
  }
  num/denom
}

I have then applied this combination formula in a for loop testing when two quantities are equal and then print out the respective n, m, and r values.

 both_large <- function(n,m,r){m*combination_1(n,r)}
 both_small <- function(n,m,r){factorial(m)*combination_1(n,(r-1))} 

 for(m in 3:8){
 for(n in 20:20000){
    for(r in 1:20){
      if(both_large(n,m,r) - both_small(n,m,r) == 0){
       cat('r = ', r, ', n = ', n,', m= ',m, '\n')
      }
    }
   }
}

This code however, only works some of the time. The following output is shown below with a skipped value at r = 9.

r =  6 , n =  149 , m=  5 
r =  7 , n =  174 , m=  5 
r =  8 , n =  199 , m=  5 
r =  10 , n =  249 , m=  5 
r =  11 , n =  274 , m=  5 

There should definitely be a value at r = 9, n = 224, m = 5; however, when I run the subtraction for these specific values, R computes a value of -2. When I run it through Wolfram Alpha, it computes a value of 0. I have also found a way to simplify my formula down a little bit further and the simplified version also works out to 0.

Why will R not compute the value at r = 9, but then correctly compute larger values at r = 10 and r = 11? Is it some sort of round off error, and if it is why will it compute larger values? It will not compute other values as well. This is just the first case of it not happening.

Thanks!

like image 756
M. Rainey Avatar asked Feb 21 '26 16:02

M. Rainey


1 Answers

This is a problem with double precision. In the article, we see that double data types are stored in 64 bits with the following breakdown:

  1. Sign bit: 1 bit
  2. Exponent: 11 bits
  3. Significand precision: 53 bits (52 explicitly stored)

Converting this to base 10, we see that we are guaranteed to get at least 15 decimal digits of precision.

log10(2^53 - 1)
[1] 15.95459

We can see this in action by observing strange behavior with simple arithmetic:

options(scipen = 999)

1e16
[1] 10000000000000000

1e16 + 5
[1] 10000000000000004  ## incorrect.. should be 10000000000000005

With your example of r = 9, n = 224, and m = 5 along with some print statements inside your combination_1 function, we spot the culprit:

combination_1_Verbose <- function(n,r,verbose = FALSE){
    n_0 <- n
    num <- 1
    denom <- factorial(r)
    for(i in 1:(r)){
        num <- num * n_0
        n_0 <- n_0-1
    }

    if (verbose) {
        print(num)
        print(log10(num))
    }

    num/denom
}

combination_1_Verbose(n, r - 1, TRUE)
[1] 5585745606995474432
[1] 18.74708
[1] 138535357316356

We are doing arithmetic on over 18 digits.. outside the range of the precision afforded to us by the double data type.

What is also not obvious, is the fact that the returned value is not exactly 138535357316356. Using the digits argument of print, we actually see that the returned value is not a whole number.

print(combination_1_Verbose(n, r - 1), digits = 22)
[1] 138535357316356.015625

This ends up being the source of your error. If we take .015625 and multiply by factorial(m) = 120, we obtain:

.015625 * 120
[1] 1.875 

This gets rounded up to 2, which is the difference in our check.

We can corrects this behavior using the multiple precision library gmp:

library(gmp)
combination_1_GMP <- function(n,r,verbose = FALSE){
    n_0 <- as.bigz(n)
    num <- as.bigz(1)
    denom <- factorialZ(r)
    for(i in 1:(r)){
        num <- mul.bigz(num, n_0)
        n_0 <- sub.bigz(n_0, 1)
    }
    
    if (verbose) {
        print(num)
        print(log10(num))
    }
    
    as.bigz(num/denom)
}

combination_1_GMP(n, r-1, TRUE)
Big Integer ('bigz') :
[1] 5585745606995473920
[1] 18.74708
Big Integer ('bigz') :
[1] 138535357316356

In the original function, num was 5585745606995474432, whereas in our gmp example we obtained 5585745606995473920. Note that the difference is less than 500, a number with 3 digits. This makes sense as our number is over 18 digits and as we stated above, we are only guaranteed 15 total digits of precision (i.e. 18 - 3 = 15).

Alternatively, we could round the final result. I wouldn't recommend this option if precision is absolutely necessary, as there are still some values of n, m, and r that will still be at the mercy of double precision. It works in this example though:

combination_1_Round <- function(n,r){
    n_0 <- n
    num <- 1
    denom <- factorial(r)
    for(i in 1:(r)){
        num <- num * n_0
        n_0 <- n_0-1
    }
    round(num/denom)
}

both_large_r <- function(n,m,r){m*combination_1_Round(n,r)}
both_small_r <- function(n,m,r){factorial(m)*combination_1_Round(n,(r-1))}

both_large_r(n,m,r) - both_small_r(n,m,r)
[1] 0

And finally, your best option is to re-write your algorithm to keep the numbers within double precision limits.

combination_1_Improved <- function(n,r){
    
    denom <- num <- 1
    i <- (n - r + 1)
    
    for (denom in 1:r) {
        num <- num * i;
        num <- num / denom;
        i <- i + 1
    }
    
    num
}

print(combination_1_Improved(n,r-1), digits = 22)
[1] 138535357316356
like image 195
Joseph Wood Avatar answered Feb 25 '26 13:02

Joseph Wood



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!