#include <stdio.h>
int main() {
int i;
for(i=0;i<10000;i++){
printf("%d",i);
}
}
I want to do loop unrolling on this code using gcc but even using the flag.
gcc -O2 -funroll-all-loops --save-temps unroll.c
the assembled code i am getting contain a loop of 10000 iteration
_main:
Leh_func_begin1:
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
pushq %r14
pushq %rbx
Ltmp2:
xorl %ebx, %ebx
leaq L_.str(%rip), %r14
.align 4, 0x90
LBB1_1:
xorb %al, %al
movq %r14, %rdi
movl %ebx, %esi
callq _printf
incl %ebx
cmpl $10000, %ebx
jne LBB1_1
popq %rbx
popq %r14
popq %rbp
ret
Leh_func_end1:
Can somone plz tell me how to implement loop unrolling correctly in gcc
A loop can be unrolled by replicating the loop body a number of times and then changing the termination logic to comprehend the multiple iterations of the loop body (Figure 6.22). The loops in Figures 6.22a and 6.22b each take four cycles to execute, but the loop in Figure 6.22b is doing four times as much work!
The UNROLL pragma specifies to the compiler how many times a loop should be unrolled. The UNROLL pragma is useful for helping the compiler utilize SIMD instructions. It is also useful in cases where better utilization of software pipeline resources are needed over a non-unrolled loop.
Unrolling simply replicates the loop body multiple times, adjusting the loop termination code. Loop unrolling can also be used to improve scheduling. Because it eliminates the branch, it allows instructions from different iterations to be scheduled together.
With -funroll-loops the compiler heuristically decides which loops to unroll. If you want to force unrolling you can use -funroll-all-loops , but it usually makes the code run slower.
Loop unrolling won't give you any benefit for this code, because the overhead of the function call to printf()
itself dominates the work done at each iteration. The compiler may be aware of this, and since it is being asked to optimize the code, it may decide that unrolling increases the code size for no appreciable run-time performance gain, and decides the risk of incurring an instruction cache miss is too high to perform the unrolling.
The type of unrolling required to speed up this loop would require reducing the number of calls to printf()
itself. I am unaware of any optimizing compiler that is capable of doing that.
As an example of unrolling the loop to reduce the number of printf()
calls, consider this code:
void print_loop_unrolled (int n) {
int i = -8;
if (n % 8) {
printf("%.*s", n % 8, "01234567");
i += n % 8;
}
while ((i += 8) < n) {
printf("%d%d%d%d%d%d%d%d",i,i+1,i+2,i+3,i+4,i+5,i+6,i+7);
}
}
gcc
has maximum loops unroll parameters.
You have to use -O3 -funroll-loops
and play with parameters max-unroll-times
, max-unrolled-insns
and max-average-unrolled-insns
.
Example:
-O3 -funroll-loops --param max-unroll-times=200
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With