Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there any preprocessor directives that control loop unrolling?

Furthermore, how does the compiler determine the extent to unroll a loop, assuming all operations in the loop are completely independent of other iterations.

like image 799
Steve Barna Avatar asked Oct 11 '12 00:10

Steve Barna


People also ask

What is loop unrolling in C?

Loop unrolling in C code When a loop is unrolled, the loop counter requires updating less often and fewer branches are executed. If the loop iterates only a few times, it can be fully unrolled so that the loop overhead completely disappears. The compiler unrolls loops automatically at -O3 .

What is the use of loop unrolling in code optimization?

Loop unrolling is a technique used to increase the number of instructions executed between executions of the loop branch logic. This reduces the number of times the loop branch logic is executed.

What is loop unrolling and why it is used what are the pros and cons of loop unrolling?

Loop unrolling is a technique that epitomizes the space vs time tradeoff. It's an optimization that expands loops to make them execute more efficiently. Many loops operate inefficiently because the loop counter has to be updated after each execution.

Is unrolling a loop always more efficient?

Unrolled loops are not always faster. They generate larger binaries. They require more instruction decoding. They use more memory and instruction cache.


2 Answers

For MSVC there is only a vector independence hint: http://msdn.microsoft.com/en-us/library/hh923901.aspx

#pragma loop( ivdep )

For many other compilers, like Intel/ibm, there a several pragma hints for optimizing a loop:

#pragma unroll
#pragma loop count N
#pragma ivdep

There is a thread with MSVC++ people about unroll heuristic: http://social.msdn.microsoft.com/Forums/en-US/vcgeneral/thread/d0b225c2-f5b0-4bb9-ac6a-4d4f61f7cb17/

VC tries to balance execution speed and code size. You can change the balance by using flags /O1 or /O2, but even when optimzing for speed VC tries to conserve code size as well.

Basically, unroll will increase code size, so it may be limited in Os and O1 modes (modes table)

PS: Pragma looks like preprocessor directive, but it is not. It is a directive for compiler and it it ignored (kept) by preprocessor.

like image 154
osgx Avatar answered Oct 21 '22 01:10

osgx


In the case of Intel Compiler:

#pragma loop count N helps the compiler to use the best strategy in order to vectorize the loop. It saves time So, we can say it helps to drive the loop unrolling. Examples:

#pragma loop_count min(n),max(n),avg(n)

#pragma unroll (n) works only when used with -O3 flag, you can use the following strategy to unroll your loop according to target processor.

Besides the increased code generated by loop unrolling, it may worth, since the compiler will produce loop's version for scalar operations as well for vector operations.

In cases where unrolling is affecting performance, for instance: loop with 20 iterations with vector length 16, results in 1 loop that executes 16 operations at once and a remainder loop that executes 4 sequentially. To avoid remainder loop generated by the compiler we can use before the loop:

#pragma vector novecremainder //or -mP2OPT_hpo_vec_peel = F to disable peel and remainder loops (compiler internal option)

or

#pragma nounroll //where unrolling is not worth at all 

Just to clarify the #pragma ivdep :

  • It gives specific hints to modify compiler heuristics about dependencies and must be used only when we know that the assumed dependencies are safe to ignore.
  • Most important, it overrides potential dependencies, but the compiler still performs a dependence analysis, try #pragma simd to vectorize regardless any analysis.

Hope this helps.

like image 28
Igor Freitas Avatar answered Oct 21 '22 00:10

Igor Freitas