Knowing the number of iteration a loop will go through allows the compiler to do some optimization. Consider for instance the two loops below :
Unknown iteration count :
static void bitreverse(vbuf_desc * vbuf)
{
unsigned int idx = 0;
unsigned char * img = vbuf->usrptr;
while(idx < vbuf->bytesused) {
img[idx] = bitrev[img[idx]];
idx++;
}
}
Known iteration count
static void bitreverse(vbuf_desc * vbuf)
{
unsigned int idx = 0;
unsigned char * img = vbuf->usrptr;
while(idx < 1280*400) {
img[idx] = bitrev[img[idx]];
idx++;
}
}
The second version will compile to faster code, because it will be unrolled twice (on ARM with gcc 4.6.3 and -O2 at least). Is there a way to make assertion on the loop count that gcc will take into account when optimizing ?
There is hot
attribute on functions to give a hint to compiler about hot-spot: http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html. Just abb before your function:
static void bitreverse(vbuf_desc * vbuf) __attribute__ ((pure));
Here the docs about 'hot
' from gcc:
hot The hot attribute on a function is used to inform the compiler that the function is a hot spot of the compiled program. The function is optimized more aggressively and on many target it is placed into special subsection of the text section so all hot functions appears close together improving locality. When profile feedback is available, via -fprofile-use, hot functions are automatically detected and this attribute is ignored.
The hot attribute on functions is not implemented in GCC versions earlier than 4.3.
The hot attribute on a label is used to inform the compiler that path following the label are more likely than paths that are not so annotated. This attribute is used in cases where __builtin_expect cannot be used, for instance with computed goto or asm goto.
The hot attribute on labels is not implemented in GCC versions earlier than 4.8.
Also you can try to add __builtin_expect around your idx < vbuf->bytesused
- it will be hint that in most cases the expression is true.
In both cases I'm not sure that your loop will be optimized.
Alternatively you can try profile-guided optimization. Build profile-generating version of program with -fprofile-generate
; run it on target, copy profile data to build-host and rebuild with -fprofile-use
. This will give a lot of information to compiler.
In some compilers (not in GCC) there are loop pragmas, including "#pragma loop count (N)
" and "#pragma unroll (M)
", e.g. in Intel; unroll in IBM; vectorizing pragmas in MSVC
ARM compiler (armcc
) also has some loop pragmas: unroll(n) (via 1):
Loop Unrolling: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0348b/CJACACFE.html and http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0348b/CJAHJDAB.html
and __promise intrinsic:
Using __promise to improve vectorization
The __promise(expr) intrinsic is a promise to the compiler that a given expression is nonzero. This enables the compiler to improve vectorization by optimizing away code that, based on the promise you have made, is redundant. The disassembled output of Example 3.21 shows the difference that __promise makes, reducing the disassembly to a simple vectorized loop by the removal of a scalar fix-up loop.
Example 3.21. Using __promise(expr) to improve vectorization code
void f(int *x, int n)
{
int i;
__promise((n > 0) && ((n&7)==0));
for (i=0; i<n;i++) x[i]++;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With