Inspired by this question,
Now visible only for users with > 10k rep
I came up with the following code:
$cat loop.c
int main( int argc, char ** argv )
{
int i = 0;
while( i++ < 2147483647 );
}
$cc -o loop loop.c
$ time ./loop
real 0m11.161s
user 0m10.393s
sys 0m0.012s
$cat Loop.java
class Loop {
public static void main( String [] args ) {
int i = 0;
while( i++ < 2147483647 );
}
}
$javac Loop.java
$time java Loop
real 0m4.578s
user 0m3.980s
sys 0m0.048s
Why does the Java version runs almost 3x faster than the C version? What I'm missing here?
This is run on Ubuntu 9.04 with:
Intel(R) Pentium(R) M @ 1.73GHz
32 bits
EDIT
This is amazing. Using the -O3 option in C optimize the loop and using -server in Java does the same. This are the "optimized times".
I expect javac
is defaulting to some higher level of optimization than your C compiler. When I compile with -O3
here, the C is way faster:
C with -O3
:
real 0m0.003s
user 0m0.000s
sys 0m0.002s
Your java program:
real 0m0.294s
user 0m0.269s
sys 0m0.051s
Some more details; without optimization, the C compiles to:
0000000100000f18 pushq %rbp
0000000100000f19 movq %rsp,%rbp
0000000100000f1c movl %edi,0xec(%rbp)
0000000100000f1f movq %rsi,0xe0(%rbp)
0000000100000f23 movl $0x00000000,0xfc(%rbp)
0000000100000f2a incl 0xfc(%rbp)
0000000100000f2d movl $0x80000000,%eax
0000000100000f32 cmpl %eax,0xfc(%rbp)
0000000100000f35 jne 0x00000f2a
0000000100000f37 movl $0x00000000,%eax
0000000100000f3c leave
0000000100000f3d ret
With optimization (-O3
), it looks like this:
0000000100000f30 pushq %rbp
0000000100000f31 movq %rsp,%rbp
0000000100000f34 xorl %eax,%eax
0000000100000f36 leave
0000000100000f37 ret
As you can see, the entire loop has been removed. javap -c Loop
gave me this output for the java bytecode:
public static void main(java.lang.String[]);
Code:
0: iconst_0
1: istore_1
2: iload_1
3: iinc 1, 1
6: ldc #2; //int 2147483647
8: if_icmpge 14
11: goto 2
14: return
}
It appears the loop is compiled in, I guess something happens at runtime to speed that one up. (As others have mentioned, the JIT compiler squashes out the loop.)
My guess is that the JIT is optimizing away the empty loop.
Update: The Java Performance Tuning article Followup to Empty Loop Benchmark seems to support that, along with the other answers here that point out that the C code needs to also be optimized in order to make a meaningful comparison. Key quote:
Had I chosen to use the client mode 1.4.1 JVM (client is the default mode), the loops would not be optimized away. Had I chosen to use Microsoft's C++ compiler, the C version would take no time. Clearly, the choice of compiler is critical.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With