Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Arithmetic with very small numbers in R

I'm trying to work with some probabilities that get very small which causes issues. For example

probs <- c(4.225867e-03,3.463125e-04,2.480971e-05,1.660538e-06,1.074064e-07,6.829168e-09,4.305051e-10,2.702241e-11,1.692533e-12,1.058970e-13,6.622117e-15,4.139935e-16,2.587807e-17,1.617488e-18,1.010964e-19,6.318630e-21,3.949177e-22 2.468246e-23,1.542657e-24,9.641616e-26,6.026013e-27,3.766259e-28,2.353912e-29,1.471195e-30,9.194971e-32

However any arithmetic with this vector causes everything after the 12th entry to round off to zero (probably because it's less than .Machine$double.eps). For example:

probs > 0
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

but

1-probs < 1
[1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

I've tried using the gmp package but I'm doing combinatoric based calculations and as.bigq(probs) gets really slow when raised to large powers.

Any ways to get around this?

like image 228
K Morgan Avatar asked Apr 04 '17 02:04

K Morgan


1 Answers

The case of very small probabilities comes up often in machine learning and other statistical computing topics. You are getting a precision error because of the limitations of the internal representation of floating point numbers. This can be solved using arbitrary precision arithmetic, but that is not commonly done.

The most popular solution is to use a log transformation to represent your probabilities and then use addition instead of multiplication. This is referred to as log-likelihood. This transformation avoids the problem of very small numbers, and in addition, the log-likelihood values can be used directly to compare the probability of things (lower log-likelihood always means lower probability).

Note that there is a subtle distinction between likelihood and probability, but the log transformation turning very small numbers in to negative ones with less variety in the number of decimal places works regardless.

like image 76
tkerwin Avatar answered Sep 20 '22 02:09

tkerwin