Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dealing with very small numbers in R

I need to calculate a list of very small numbers such as

(0.1)^1000, 0.2^(1200),

and then normalize them so they will sum up to one i.e.

a1 = 0.1^1000, a2 = 0.2^1200

And I want to calculate a1' = a1/(a1+a2), a2'=a2(a1+a2).

I'm running into underflow problems, as I get a1=0. How can I get around this? Theoretically I could deal with logs, and then log(a1) = 1000*log(0.l) would be a way to represent a1 without underflow problems - But in order to normalize I would need to get log(a1+a2) - which I can't compute since I can't represent a1 directly.

I'm programming with R - as far as I can tell there is no data type such Decimal in c# which allows you to get better than double-precision value.

Any suggestions will be appreciated, thanks

like image 736
dan12345 Avatar asked Apr 27 '11 10:04

dan12345


2 Answers

Mathematically spoken, one of those numbers will be appx. zero, and the other one. The difference between your numbers is huge, so I'm even wondering if this makes sense.

But to do that in general, you can use the idea from the logspace_add C-function that's underneath the hood of R. One can define logxpy ( =log(x+y) ) when lx = log(x) and ly = log(y) as :

logxpy <- function(lx,ly) max(lx,ly) + log1p(exp(-abs(lx-ly)))

Which means that we can use :

> la1 <- 1000*log(0.1)
> la2 <- 1200*log(0.2)

> exp(la1 - logxpy(la1,la2))
[1] 5.807714e-162

> exp(la2 - logxpy(la1,la2))
[1] 1

This function can be called recursively as well if you have more numbers. Mind you, 1 is still 1, and not 1 minus 5.807...e-162 . If you really need more precision and your platform supports long double types, you could code everything in eg C or C++, and return the results later on. But if I'm right, R can - for the moment - only deal with normal doubles, so ultimately you'll lose the precision again when the result is shown.


EDIT :

to do the math for you :

log(x+y) = log(exp(lx)+exp(ly))
         = log( exp(lx) * (1 + exp(ly-lx) )
         = lx + log ( 1 + exp(ly - lx)  )

Now you just take the largest as lx, and then you come at the expression in logxpy().

EDIT 2 : Why take the maximum then? Easy, to assure that you use a negative number in exp(lx-ly). If lx-ly gets too big, then exp(lx-ly) would return Inf. That's not a correct result. exp(ly-lx) would return 0, which allows for a far better result:

Say lx=1 and ly=1000, then :

> 1+log1p(exp(1000-1))
[1] Inf
> 1000+log1p(exp(1-1000))
[1] 1000
like image 76
Joris Meys Avatar answered Sep 30 '22 14:09

Joris Meys


The Brobdingnag package deals with very large or small numbers, essentially wrapping Joris's answer into a convenient form.

a1 <- as.brob(0.1)^1000
a2 <- as.brob(0.2)^1200
a1_dash <- a1 / (a1 + a2)
a2_dash <- a2 / (a1 + a2)
as.numeric(a1_dash)
as.numeric(a2_dash)
like image 26
Richie Cotton Avatar answered Sep 30 '22 14:09

Richie Cotton