Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loop unrolling & optimization

Tags:

c

optimization

Given the code :

for (int i = 0; i < n; ++i) 
{ 
  A(i) ; 
  B(i) ; 
  C(i) ; 
}

And the optimization version :

for (int i = 0; i < (n - 2); i+=3) 
{ 
  A(i) 
  A(i+1) 
  A(i+2) 
  B(i) 
  B(i+1) 
  B(i+2) 
  C(i) 
  C(i+1) 
  C(i+2)
}

Something is not clear to me : which is better ? I can't see anything that works any faster using the other version . Am I missing something here ?

All I see is that each instruction is depending on the previous instruction , meaning that I need to wait that the previous instruction would finish in order to start the one after ...

Thanks

like image 411
JAN Avatar asked Apr 09 '12 21:04

JAN


4 Answers

In the high-level view of a language, you're not going to see the optimization. The speed enhancement comes from what the compiler does with what you have.

In the first case, it's something like:

LOCATION_FLAG;
DO_SOMETHING;
TEST FOR LOOP COMPLETION;//Jumps to LOCATION_FLAG if false

In the second it's something like:

LOCATION_FLAG;
DO_SOMETHING;
DO_SOMETHING;
DO_SOMETHING;
TEST FOR LOOP COMPLETION;//Jumps to LOCATION_FLAG if false

You can see in the latter case, the overhead of testing and jumping is only 1 instruction per 3. In the first it's 1 instruction per 1; so it happens a lot more often.

Therefore, if you have invariants you can rely on (an array of mod 3, to use your example) then it is more efficient to unwind loops because the underlying assembly is written more directly.

like image 50
Nathaniel Ford Avatar answered Oct 04 '22 20:10

Nathaniel Ford


Loop unrolling is used to reduce the number of jump & branch instructions which could potentially make the loop faster but will increase the size of the binary. Depending on the implementation and platform, either could be faster.

like image 35
P.P Avatar answered Oct 04 '22 19:10

P.P


Well, whether this code is "better" or "worse" totally depends on implementations of A, B and C, which values of n you expect, which compiler you are using and which hardware you are running on.

Typically the benefit of loop unrolling is that the overhead of doing the loop (that is, increasing i and comparing it with n) is reduced. In this case, could be reduced by a factor of 3.

like image 30
Johan Kotlinski Avatar answered Oct 04 '22 19:10

Johan Kotlinski


As long as the functions A(), B() and C() don't modify the same datasets, the second verion provides more parallelization options.

In the first version, the three functions could run simultaneously, assuming no interdependencies. In the second version, all three functions could be run with all three datasets at the same time, assuming you had enough execution units to do so and again, no interdependencies.

like image 32
Baldy Avatar answered Oct 04 '22 19:10

Baldy