Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java code optimization leads to numerical inaccuracies and errors

I'm trying to implement a version of the Fuzzy C-Means algorithm in Java and I'm trying to do some optimization by computing just once everything that can be computed just once.

This is an iterative algorithm and regarding the updating of a matrix, the pixels x clusters membership matrix U (the sum of the values in a row must be 1.0), this is the update rule I want to optimize:

alt text

where the x are the element of a matrix X (pixels x features) and v belongs to the matrix V (clusters x features). And m is a parameter that ranges from 1.1 to infinity and c is the number of clusters. The distance used is the euclidean norm.

If I had to implement this formula in a banal way I'd do:

    for(int i = 0; i < X.length; i++)
    {
        int count = 0;
        for(int j = 0; j < V.length; j++)
        {
                double num = D[i][j];
                double sumTerms = 0;
                for(int k = 0; k < V.length; k++)
                {
                     double thisDistance = D[i][k];
                     sumTerms += Math.pow(num / thisDistance, (1.0 / (m - 1.0)));
                }
                U[i][j] = (float) (1f / sumTerms);
         }
     }

In this way some optimization is already done, I precomputed all the possible squared distances between X and V and stored them in a matrix D but that is not enough, since I'm cycling througn the elements of V two times resulting in two nested loops. Looking at the formula the numerator of the fraction is independent of the sum so I can compute numerator and denominator independently and the denominator can be computed just once for each pixel. So I came to a solution like this:

    int nClusters = V.length;
    double exp = (1.0 / (m - 1.0));
    for(int i = 0; i < X.length; i++)
    {
        int count = 0;
        for(int j = 0; j < nClusters; j++)
        {
             double distance = D[i][j];
             double denominator = D[i][nClusters];
             double numerator = Math.pow(distance, exp);
             U[i][j] = (float) (1f / (numerator * denominator));
        }
     }

Where I precomputed the denominator into an additional column of the matrix D while I was computing the distances:

    for (int i = 0; i < X.length; i++)
    {
        for (int j = 0; j < V.length; j++)
        {
            double sum = 0;
            for (int k = 0; k < nDims; k++)
            {
                final double d = X[i][k] - V[j][k];
                sum += d * d;
            }

            D[i][j] = sum;
            D[i][B.length] += Math.pow(1 / D[i][j], exp);
        }
    }

By doing so I encounter numerical differences between the 'banal' computation and the second one that leads to different numerical value in U (not in the first iterates but soon enough). I guess that the problem is that exponentiate very small numbers to high values (the elements of U can range from 0.0 to 1.0 and exp , for m = 1.1, is 10) leads to very small values, whereas by dividing the numerator and the denominator and THEN exponentiating the result seems to be better numerically. The problem is it involves much more operations.

UPDATE

Some values I get at ITERATION 0:

This is the first row of the matrix D not optimized:

384.6632 44482.727 17379.088 1245.4205

This is the first row of the matrix D the optimized way (note that the last value is the precomputed denominator):

384.6657 44482.7215 17379.0847 1245.4225 1.4098E-26

This is the first row of the U not optimized:

0.99999213 2.3382613E-21 2.8218658E-17 7.900302E-6

This is the first row of the U optimized:

0.9999921 2.338395E-21 2.822035E-17 7.900674E-6

ITERATION 1:

This is the first row of the matrix D not optimized:

414.3861 44469.39 17300.092 1197.7633

This is the first row of the matrix D the optimized way (note that the last value is the precomputed denominator):

414.3880 44469.38 17300.090 1197.7657 2.0796E-26

This is the first row of the U not optimized:

0.99997544 4.9366603E-21 6.216704E-17 2.4565863E-5

This is the first row of the U optimized:

0.3220644 1.5900239E-21 2.0023086E-17 7.912171E-6

The last set of values shows that they are very different due to a propagated error (I still hope I'm doing some mistake) and even the constraint that the sum of those values must be 1.0 is violated.

Am I doing something wrong? Is there a possible solution to get both the code optimized and numerically stable? Any suggestion or criticism will be appreciated.

like image 349
rano Avatar asked Jan 09 '11 09:01

rano


1 Answers

This problem has nothing to do with floating point stability.

You are getting wrong denominator values on the second and following iterations because you forget to clear its cell before accumulating the sum.

The right denominator for iteration 1 is 6.697905e-27, that is almost 2.0796E-26 - 1.4098E-26.

like image 96
axtavt Avatar answered Nov 10 '22 01:11

axtavt