Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Double overflow

So basically, I am trying to compute the Likelihood ratio of two things happening together. The equations are straight enough, but the problem is that my data are rather large and sometimes the middle operations overflow.

I am currently using double for my variables, so the up-casting is not possible.
Also the equation has Logarithm and exponential operators. However I did not find any non-basic mathematical functions for BigDecimal or similar types.

In addition, I already tried simplifying the equations as much as possible.

I wonder what are my options here. Here is my code:

    c1 = unigramsInfo.get(w1)[0];
    c2 = unigramsInfo.get(w2)[0];
    c12 = entry.getValue()[0];
    N = additionalInfo.get("tail")[1];

    p = c2 / N;
    p1 = c12 / c1;
    p2 = (c2 - c12) / (N - c1);

likelihood = - 2 * ( c2 * Math.log(p) + (N - c2) * Math.log(1 - p)
             - c12 * Math.log(p1) - (c1 - c12) * Math.log(1 - p1)
             - (c2 - c12) * Math.log(p2) 
             - (N - c1 - c2 - c12) * Math.log(1 - p2) );

The N here could be as big as ten million and the probabilities could become as small as 1.0E-7.

like image 674
Atorpat Avatar asked Nov 09 '22 02:11

Atorpat


1 Answers

I've tried with you expression (as I don't know origin of c1, c2, c12 and N I hardcoded theirs values). So hardcoded values look like this:

double c1 = 0.1;
double c2 = 0.2;
double c12 = 0.3;
double N = 0.4;

And I've got likelihood=NaN.

As mentioned in comments above, pay attention to input. First problematic expressions are (you can get overflow here due to division of extra little or big numbers):

double p = c2 / N;
double p1 = c12 / c1;
double p2 = (c2 - c12) / (N - c1);

Then you calculate logarithms. Actually in my case (with hardcoded values listed above) I got NaN in Math.log(1 - p1) expression (as it tries to calculate decimal logarithm of negative number - p1 < 1 when c1 > c2 - very probable case).

Generally speaking, you can get not only overflow (in extremal cases) but NaN as well (even for "sane-looking" input).

The suggestion is to split you long expression into small Java-expressions. And verify each value that can cause NaN or overflow before calculation and throw exceptions manually. This will help to localize the cause of problem when you get invalid input.

like image 119
flaz14 Avatar answered Nov 15 '22 07:11

flaz14