Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

why for loop has 1 extra instruction than expected?

I write a lot of vectorized loops, so 1 common idiom is

volatile int dummy[1<<10];
for (int64_t i = 0; i + 16 <= argc; i+= 16)   // process all elements with whole vector
{
  int x = dummy[i];
}
// handle remainder (hopefully with SIMD too)

But the resulting machine code has 1 more instruction than I would like (using gcc 4.9)

.L3:
        leaq    -16(%rax), %rdx
        addq    $16, %rax
        cmpq    %rcx, %rax
        movl    -120(%rsp,%rdx,4), %edx
        jbe     .L3

If I change the code to for (int64_t i = 0; i <= argc - 16; i+= 16), then the "extra" instruction is gone:

.L2:
        movl    -120(%rsp,%rax,4), %ecx
        addq    $16, %rax
        cmpq    %rdx, %rax
        jbe     .L2

But why the difference? I was thinking maybe it was due to loop invariants, but too vaguely. Then I noticed in the 5 instruction case, the increment is done before the load, which would require an extra mov due to x86's destructive 2 operand instructions. So another explanation could be that it's trading instruction parallelism for 1 extra instruction.

Although it seems there would hardly be any performance difference, can someone explain this mystery (preferably who knows about compiler transformations)?

Ideally I would like to keep the i + 16 <= size form since that has a more intuitive meaning (the last element of the vector doesn't go out of bounds)

like image 348
Yale Zhang Avatar asked May 09 '14 22:05

Yale Zhang


People also ask

Why does a loop run'n 1 times?

is executed n-1 times because the upper bound of the range is non-inclusive. So it will be executed when j=0, j=1, ... j = n-1. The loop itself will execute one more time (i.e. when j=n) and see that the value is no longer in range and will not execute the body.

What are the 3 arguments of the for loop?

With three arguments, the sequence starts at the first value, ends before the second argument and increments or decrements by the third value.

Why for loop is efficient?

It limits the scope of counter variables better than a while loop , hence it helps in better memory management.

Can we use for loop two times?

In the for loop, this is happening because of the initialization part i.e., for(i=0;..). Different kinds of loops provide convenience to a programmer while writing code and one can use any kind of loop any number of times in a program.


1 Answers

If argc were below -2147483632, and i was below 2147483632, the expressions i+16 <= argc would be required to yield an arithmetically-correct result, while the expression and i<argc-16 would not. The need to give an arithmetically-correct result in that corner case prevents the compiler from optimizing the former expression to match the latter.

like image 62
supercat Avatar answered Oct 15 '22 09:10

supercat