Considering the following test programs :
Loop value on the stack
int main( void ) {
int iterations = 1000000000;
while ( iterations > 0 )
-- iterations;
}
Loop value on the stack (dereferenced)
int main( void ) {
int iterations = 1000000000;
int * p = & iterations;
while ( * p > 0 )
-- * p;
}
Loop value on the heap
#include <stdlib.h>
int main( void ) {
int * p = malloc( sizeof( int ) );
* p = 1000000000;
while ( *p > 0 )
-- * p;
}
By compiling them with -O0, I get the following execution times :
case1.c
real 0m2.698s
user 0m2.690s
sys 0m0.003s
case2.c
real 0m2.574s
user 0m2.567s
sys 0m0.000s
case3.c
real 0m2.566s
user 0m2.560s
sys 0m0.000s
[edit] Following is the average on 10 executions :
case1.c
2.70364
case2.c
2.57091
case3.c
2.57000
Why is the execution time bigger with the first test case, which seems to be the simplest ?
My current architecture is a x86 virtual machine (Archlinux). I get these results both with gcc (4.8.0) and clang (3.3).
[edit 1] Generated assembler codes are almost identical except that the second and third ones have more instructions than the first one.
[edit 2] These performances are reproducible (on my system). Each execution will have the same order of magnitude.
[edit 3] I don't really care about performances of a non-optimized program, but I don't understand why it would be slower, and I'm curious.
It's hard to say if this is the reason since I'm doing some guessing and you haven't given some specifics (like which target you're using). But what I see when I compile without optimziations with an x86 target is the following sequences for decrementign the iterations
variable:
Case 1:
L3:
sub DWORD PTR [esp+12], 1
L2:
cmp DWORD PTR [esp+12], 0
jg L3
Case 2:
L3:
mov eax, DWORD PTR [esp+12]
mov eax, DWORD PTR [eax]
lea edx, [eax-1]
mov eax, DWORD PTR [esp+12]
mov DWORD PTR [eax], edx
L2:
mov eax, DWORD PTR [esp+12]
mov eax, DWORD PTR [eax]
test eax, eax
jg L3
One big difference that you see in case 1 is that the instruction at L3
reads and writes the memory location. It is followed immediately byu an instruction that reads the same memory location that was just written. This sort of sequence of instructions (the same memory location written then immediate used in the next instruction) often causes some sort of pipeline stall in modern CPUs.
You'll note that the write followed immediately by a read of the same location is not present in case 2.
Again - this answer is a bit of informed speculation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With