Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When is loop unwinding effective?

Loop unwinding is a common way to help the compiler to optimize performance. I was wondering if and to what extent the performance gain is affected by what is in the body of the loop:

  1. number of statements
  2. number of function calls
  3. use of complex data types, virtual methods, etc.
  4. dynamic (de)allocation of memory

What rules (of thumb?) do you use to decide whether or not to unwind a performance critical loop? What other optimisation do you consider in these cases?

like image 713
andreas buykx Avatar asked Oct 10 '08 09:10

andreas buykx


4 Answers

In general unrolling loops by hand is not worth the effort. The compiler knows better how the target architecture works and will unroll the loop if it is beneficial.

There are code-paths that benefit when unrolled for Pentium-M type CPU's but don't benefit for Core2 for example. If I unroll by hand the compiler can't make the decision anymore and I may end up with less than optimal code. E.g. exactly the opposite I tried to achieve.

There are several cases where I do unroll performance critical loops by hand, but I only do this if I know that the compiler will - after manual unrolling - be able to use architectural specific feature such as SSE or MMX instructions. Then, and only then I do it.

Btw - modern CPUs are very efficient at executing well predictable branches. This is exactly what a loop is. The loop overhead is so small these days that it rarely makes a difference. Memory latency effects that may occur due to the increase in code-size will however make a difference.

like image 107
Nils Pipenbrinck Avatar answered Oct 29 '22 09:10

Nils Pipenbrinck


This is an optimisation question, and as such there is only one rule of thumb: test the performance, and try a loop unwinding optimisation only if your testing demonstrates that you need to. Consider less disruptive optimisations first.

like image 22
Jon Topper Avatar answered Oct 29 '22 09:10

Jon Topper


In my experience, loop unwinding, and the work it takes is effective when:

  • There are only a few statements within the loop.
  • the statements involve only small number of different variables and no function calls
  • Your operations work on already allocated memory (an in-place image transformation for example)

Partial unwinding is often less work for 80% of the gain. So instead of looping over all pixels of an N by M image (NM iterations) where N is always divisible by 8, loop (NM/8) times over each block of eight pixels. This is especially efficient if you are perfoming some operation that uses some of the neighbouring pixels.

I've had very good results hand-optimising pixel-wise operations into MMX or SSE instructions (8 or 16 pixels at a time) but I've also spent days hand optimising something only to find out that the version optimised by the compiler ran ten times faster.

And by the way, for the most (beautiful|remarkable) example of loop unwinding check out Duffs device

like image 7
jilles de wit Avatar answered Oct 29 '22 10:10

jilles de wit


An important thing to consider: In production code at your workplace, future readability of your code far outweighs the benefits of loop unwinding. Hardware is cheap, programmer time is not. I would only worry about loop unwinding if that is the ONLY way to solve a proven performance problem (say in a low-powered device).

Other thoughts: The characteristics of compilers vary greatly, and in some cases, like Java, the determination is made on the fly by the HotspotJVM, so there I would argue against loop unwinding in any case.

like image 4
Galghamon Avatar answered Oct 29 '22 09:10

Galghamon