Why do compilers seem to prefer to optimize pretest loops into a check, conditional jump over, then a do-while structure, versus doing an unconditional jump inside the the do while loop?
I wrote a function that is written in the second style I described, but both g++ and clang undo it and transform it into method one. https://godbolt.org/g/2Dvudi
I'm confused because it seems like the compiler duplicates a lot of instructions (maybe not that much for this example) for the pretest. Also, It may do a jump anyway (though maybe it is statically predicted to be not taken, and not a big deal in the average case), so why not just do an unconditional jump always?
Here is one thought I have about it, but It doesn't strongly support either method:
The loop wants to be aligned, so there may be room to duplicate some instructions upfront anyway without wasting space, since they'd be padded with nops. However, both clang and gcc emit code that exceeds 16-bytes for the pretest, and end up inserting a large nop afterwards.
Edit: Here is the code from the godbolt link:
typedef unsigned char uchar;
unsigned my_atoi(const uchar *p)//sentinel at end
{
unsigned acm=0u;
unsigned d;
goto LEnter;
do{
acm = acm*10u + d;
LEnter:
d = *p++ - '0';
}while (d<10u);
return acm;
}
clang 5.0 at -O2 emits:
my_atoi(unsigned char const*): # @my_atoi(unsigned char const*)
movzx ecx, byte ptr [rdi]
add ecx, -48
xor eax, eax
cmp ecx, 9
ja .LBB0_3
inc rdi
xor eax, eax
.LBB0_2: # =>This Inner Loop Header: Depth=1
lea eax, [rax + 4*rax]
lea eax, [rcx + 2*rax]
movzx ecx, byte ptr [rdi]
add ecx, -48
inc rdi
cmp ecx, 10
jb .LBB0_2
.LBB0_3:
ret
Quoting some comments from GCC sources of a relevant optimization pass.
Duplicates headers of loops if they are small enough, so that the statements in the loop body are always executed when the loop is entered. This increases effectiveness of code motion optimizations, and reduces the need for loop preconditioning.
I.e., if the later passes find some loop-invariant code, they will have a place to move that code to without a need to add a check, whether the loop will iterate at all.
For all loops, copy the condition at the end of the loop body in front of the loop. This is beneficial since it increases efficiency of code motion optimizations. It also saves one jump on entry to the loop.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With