Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cause of comparing long slower than comparing double

I wrote a little program to calculate the first 18 triples (x,y,z) with x<y<z, which satisfy x^3+y^3=z^3+1.

While playing around to optimise the total runtime, I discovered, that using double for the cubic values and the two sides of the equation is faster than using long. On my machine the difference is about 3 seconds.

Now I wonder why exactly this is the case. I guess it is somewhere in the internal handling of long while the comparison of two long-Variables, as this is the only thing, which changes within the calculation loops.

Here is my code:

class Threes {
  public static void main(String[] args) {
    System.out.println("Threes --- Java");
    int Z_MAX = 60000, Y_MAX = Z_MAX-1, X_MAX = Y_MAX-1;
    double[] powers = new double[Z_MAX+1];
    for (int i = 0; i <= Z_MAX; i++) {
      powers[i] = Math.pow(i, 3);
    }
    System.out.println("Powers calculated");
    int x, y, z;
    double right, left;
    int[][] sets = new int[18][3];
    int foundCount = 0;
    long loopCount = 0;
    long start, end;
    start = System.currentTimeMillis();

    for (x = 1 ; x < X_MAX; x++) {
      for (y = x + 1; y < Y_MAX; y++) {
        right = powers[x] + powers[y];
        for (z = y + 1; z < Z_MAX; z++) {
          left = powers[z] + 1;
          if (right < left) {
            z = Z_MAX;
          } else if (right == left) {
            sets[foundCount][0] = x;
            sets[foundCount][1] = y;
            sets[foundCount][2] = z;
            foundCount++;
            end = System.currentTimeMillis();
            System.out.println("found " + foundCount + ". set:\t" + x + "\t" + y + "\t" + z + "\t" + ((end - start) / 1000.0));
            if (foundCount == 18) {
              x = X_MAX;
              y = Y_MAX;
              z = Z_MAX;
            }
          }
          loopCount++;
        }
      }
    }
    System.out.println("finished: " + loopCount);
  }
}

The lines I changed are:

double[] powers = new double[Z_MAX+1];

becomes

long[] powers = new long[Z_MAX+1];

and

powers[i] = Math.pow(i, 3);

becomes

powers[i] = (long)Math.pow(i, 3);

and

double right, left;

becomes

long right, left;

"Bonus Question": What other possibilities of optimizing the whole code in terms of total runtime do I have? I know, that leaving out the loopCount gives me some milliseconds. I'm sure, that I have to reduce the number of loop iterations significantly. But how?

like image 802
Torbjörn Avatar asked Dec 16 '22 12:12

Torbjörn


2 Answers

If you are using 32-bit operating system the performance for long-variable could be worse because long is 64-bit type. For example, with 64-bit OS Java could do the comparison with just one machine instruction, but in 32-bit environment it has to use multiple machine instructions, since it can only handle 32-bit at the time.

But for double, this is not neccessary the case, since 32-bit systems have machine instructions for 64-bit floating point numbers, even when thet don't have them for 64-bit integers.

Also, with code:

powers[i] = (long)Math.pow(i, 3);

there is two unneccesary conversions, first i (integer) is converted to double (that's what Math.pow takes) and then the return value is converted back to 64-bit integer (long).

like image 58
Juho Avatar answered Dec 19 '22 01:12

Juho


It's probably fair to say that your code spends most of its time in this section:

for (z = y + 1; z < Z_MAX; z++) {
    left = powers[z] + 1;
     if (right < left) {
        z = Z_MAX;
     }

And most of the time, it will always be taking the same branch out of the conditional. So once your code has reached the steady-state (i.e. once the CPU's branch predictor is set up), the run-time will be dominated by the computation itself: dependencies are minimised, so the latency of the instruction pipeline doesn't matter.

On a 32-bit machine, doing addition and comparison on 64-bit integer types takes more instructions than doing the equivalent on doubles. A double calculation will take more cycles to complete, but that doesn't matter. We're dominated by instruction throughput, not latency. So the overall run-time will be longer.

In terms of further optimization, you could move the +1 outside the inner loop, by calculating right = powers[x] + powers[y] - 1. But it's possible the optimizer has already spotted that.

like image 23
Oliver Charlesworth Avatar answered Dec 19 '22 02:12

Oliver Charlesworth