I am trying to selectively unroll the second loop in the following program:
#include <stdio.h>
int main()
{
int in[1000], out[1000];
int i,j;
#pragma nounroll
for (i = 100; i < 1000; i++)
{
in[i]+= 10;
}
#pragma unroll 2
for (j = 100; j < 1000; j++)
{
out[j]+= 10;
}
return 1;
}
When I run clang (3.5) with the following options, it unrolls both the loops 4 times.
clang -std=c++11 -O3 -fno-slp-vectorize -fno-vectorize -mllvm -unroll-count=4 -mllvm -debug-pass=Arguments -emit-llvm -c *.cpp
What am I doing wrong? Also, if I add -fno-unroll-loops
, or skip the -unroll-count=4
flag, it does not unroll any loop.
Also, any hints on how to debug pragma errors?
Loop unrolling, also known as loop unwinding, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as space–time tradeoff.
ARM Compiler toolchain Using the Compiler Version 4.1 When a loop is unrolled, a loop counter needs to be updated less often and fewer branches are executed. If the loop iterates only a few times, it can be fully unrolled so that the loop overhead completely disappears.
The UNROLL pragma specifies to the compiler how many times a loop should be unrolled. The UNROLL pragma is useful for helping the compiler utilize SIMD instructions. It is also useful in cases where better utilization of software pipeline resources are needed over a non-unrolled loop.
I think there is no support for such pragmas in clang 3.5.
However starting from 3.6, you can use #pragma clang loop unroll(enable | disable)
to enable or disable the automatic diagnostics-based unroll feature. If you want to fully unroll a loop then #pragma clang loop unroll(full)
is a shorthand for that. You can also use #pragma clang loop unroll_count(N)
- where N is a compile-time constant - to explicitly specify the unroll count.
More info here.
Your code rewritten in terms of the above stuff:
#include <stdio.h>
int main()
{
int in[1000], out[1000];
int i,j;
#pragma clang loop unroll(disable)
for (i = 100; i < 1000; i++)
{
in[i]+= 10;
}
#pragma clang loop unroll_count(2)
for (j = 100; j < 1000; j++)
{
out[j]+= 10;
}
return 1;
}
-unroll-count=4
has a higher priority than #pragma clang loop unroll_count(2)
. That's why it ends up unroll it by 4. Meaning the compiler is following the unroll-count command line option NOT the pragma. Also as plasmacel mentioned, #pragma clang loop unroll is not supported before clang 3.6.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With