Java Loops Optimization

Question

Give the following (straight-forward) code:

public class pr1 {

    public static void f1(){
        long sx = 0, s;
        s = System.currentTimeMillis();
        for(long i = 0; i < Integer.MAX_VALUE; ++i){
            sx += i;
        }
        System.out.println("f1(): " + (System.currentTimeMillis() - s));
    }

    public static void f2(){
        long sx = 0, s, i;
        s = System.currentTimeMillis();
        i = Integer.MAX_VALUE;
        while(i-->0){
            sx+=i;
        }
        sx += Integer.MAX_VALUE;
        System.out.println("f2(): " + (System.currentTimeMillis() - s));
    }

    public static void f3(){
        long sx = 0, s, i;
        s = System.currentTimeMillis();
        i = Integer.MAX_VALUE;
        while(--i>0){
            sx+=i;
        }
        sx += Integer.MAX_VALUE;
        System.out.println("f3(): " + (System.currentTimeMillis() - s));
    }

    public static void f4(){
        long sx = 0, s, i;
        s = System.currentTimeMillis();
        i = Integer.MAX_VALUE;
        do{
            sx+=i;
        }while(--i>0);
        System.out.println("f4(): " + (System.currentTimeMillis() - s));
    }

    public static void main(String args[]){
        f1();
        f2();
        f3();
        f4();
    }
}

And the actual results after running the code:

f1(): 5828
f2(): 8125
f3(): 3406
f4(): 3781

Can you please explain me the big time differences ? Theoretically the loops are achieving the same functionality, but in practice it seems there is a relevant time difference for each of the four versions.

After repetitive executions the results are very much the same.

LATER EDIT As another test I've rewritten the main method:

public static void main(String args[]){
    for(int i = 0; i < 4; ++i){
        f1(); f2(); f3(); f4();
    }
}

And the new results are:

f1(): 5906
f2(): 8266
f3(): 3406
f4(): 3844
f1(): 5843
f2(): 8125
f3(): 3438
f4(): 3859
f1(): 5891
f2(): 8156
f3(): 3406
f4(): 3813
f1(): 5859
f2(): 8172
f3(): 3438
f4(): 3828

And for 10 repititions:

f1(): 5844
f2(): 8156
f3(): 3453
f4(): 3813
f1(): 5844
f2(): 8218
f3(): 3485
f4(): 3937
f1(): 5985
f2(): 8156
f3(): 3422
f4(): 3781
f1(): 5828
f2(): 8234
f3(): 3469
f4(): 3828
f1(): 5844
f2(): 8328
f3(): 3422
f4(): 3859
f1(): 5844
f2(): 8188
f3(): 3406
f4(): 3797
f1(): 5906
f2(): 8219
f3(): 3422
f4(): 3797
f1(): 5843
f2(): 8203
f3(): 3454
f4(): 3906
f1(): 5844
f2(): 8140
f3(): 3469
f4(): 3812
f1(): 5860
f2(): 8109
f3(): 3422
f4(): 3813

After removing the calculus between the loops, the results are still a little different:

public class pr2 {

    public static void f1(){
        long sx = 0, s;
        s = System.currentTimeMillis();
        for(long i = 0; i < Integer.MAX_VALUE; ++i);
        System.out.println("f1(): " + (System.currentTimeMillis() - s));
    }

    public static void f2(){
        long sx = 0, s, i;
        s = System.currentTimeMillis();
        i = Integer.MAX_VALUE;
        while(i-->0);
        System.out.println("f2(): " + (System.currentTimeMillis() - s));
    }

    public static void f3(){
        long sx = 0, s, i;
        s = System.currentTimeMillis();
        i = Integer.MAX_VALUE;
        while(--i>0);
        System.out.println("f3(): " + (System.currentTimeMillis() - s));
    }

    public static void f4(){
        long sx = 0, s, i;
        s = System.currentTimeMillis();
        i = Integer.MAX_VALUE;
        do{
        }while(--i>0);
        System.out.println("f4(): " + (System.currentTimeMillis() - s));
    }

    public static void main(String args[]){
        for(int i = 0; i < 2; ++i){
            f1(); f2(); f3(); f4();
        }
    }
}

But the time difference still exists:

f1(): 3219
f2(): 4859
f3(): 2610
f4(): 3031
f1(): 3219
f2(): 4812
f3(): 2610
f4(): 3062

JVM:

java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) Client VM (build 16.3-b01, mixed mode, sharing)

LATER EDIT: For the first version, I've used -O parameter for javac. The new results are:

f1(): 3219
f2(): 4859
f3(): 2610
f4(): 3031

LATER EDIT

Ok, I've tried the same code at home, using a Linux machine with:

java version "1.6.0_18"
OpenJDK Runtime Environment (IcedTea6 1.8) (6b18-1.8-0ubuntu1)
OpenJDK Server VM (build 14.0-b16, mixed mode)

And the results were "normal". No problems now:

f1(): 7495
f2(): 7418
f3(): 7457
f4(): 7384

BalusC · Accepted Answer

You're actually benchmarking the JVM, not the code.

public static void f2();
  Code:
   0:   lconst_0
   1:   lstore_0
   2:   invokestatic    #2; //Method java/lang/System.currentTimeMillis:()J
   5:   lstore_2
   6:   ldc2_w  #3; //long 2147483647l
   9:   lstore  4
   11:  lload   4
   13:  dup2
   14:  lconst_1
   15:  lsub
   16:  lstore  4
   18:  lconst_0
   19:  lcmp
   20:  ifle    31
   23:  lload_0
   24:  lload   4
   26:  ladd
   27:  lstore_0
   28:  goto    11
   31:  lload_0
   32:  ldc2_w  #3; //long 2147483647l
   35:  ladd
   36:  lstore_0
   37:  getstatic       #5; //Field java/lang/System.out:Ljava/io/PrintStream;
   40:  new     #6; //class java/lang/StringBuilder
   43:  dup
   44:  invokespecial   #7; //Method java/lang/StringBuilder."<init>":()V
   47:  ldc     #13; //String f2():
   49:  invokevirtual   #9; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
   52:  invokestatic    #2; //Method java/lang/System.currentTimeMillis:()J
   55:  lload_2
   56:  lsub
   57:  invokevirtual   #10; //Method java/lang/StringBuilder.append:(J)Ljava/lang/StringBuilder;
   60:  invokevirtual   #11; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
   63:  invokevirtual   #12; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   66:  return

One possible reason that f2() is slower may be that the compiler/JVM aren't being smart about the while(i-->0) post-decrement operator. Basically, you need the value of i both before and after the increment, so this operation involves more work if naively implemented.

seand · Answer

After multiple runs the Hotspot compiler is likely optimizing each method. This takes time and has variations. But since the loops all work the about same way it seems reasonable that the times would eventually become similar.

Stephen C · Answer

I suspect that this is something to do with doing 64 bit arithmetic on a machine with a 32 bit ALU. I suspect that certain combinations of test before / after increment / decrement take longer at the native instruction level due to subtle pipelining effects. The fact that someone reported that the numbers were flat on a 64 bit machine supports this theory. The way to confirm this would be to get a dump of the native code generated by the JIT compiler, get hold of the documentation for your particular CPU and figure out where the clock cycles are going.

But to be honest, I don't know if this is worthwhile. We've clear evidence that your micro-benchmark numbers are CPU dependent, and the "work" done is clearly non-representative. (Why would you use a long loop counter on a 32 bit machine?)

And I'm also a bit surprised that the JIT compiler didn't figure out that the loops can (in each case) be completely optimized away.

Java Loops Optimization

Tags:

java

performance

optimization

Andrei Ciobanu

4 Answers

See also:

See also:

BalusC

Avi

seand

Stephen C

Recent Activity

Donate For Us

Java Loops Optimization

Tags:

java

performance

optimization

Andrei Ciobanu

4 Answers

See also:

See also:

BalusC

Avi

seand

Stephen C

Related questions

Recent Activity

Donate For Us