int test1(int a, int b) {
if (__builtin_expect(a < b, 0))
return a / b;
return b;
}
was compiled by clang with -O3 -march=native
to
test1(int, int): # @test1(int, int)
cmp edi, esi
jl .LBB0_1
mov eax, esi
ret
.LBB0_1:
mov eax, edi
cdq
idiv esi
mov esi, eax
mov eax, esi # moving eax back and forth
ret
why eax
is being moved back and forth after the idiv
?
gcc has a similar behavior so this seem to be intended.
gcc with -O3 -march=native
complied the code to
test1(int, int):
mov r8d, esi
cmp edi, esi
jl .L4
mov eax, r8d
ret
.L4:
mov eax, edi
cdq
idiv esi
mov r8d, eax
mov eax, r8d #back and forth mov
ret
godbolt
This is not a complete solution to the puzzle but should give some clues.
Without the __builtin_expect
, clang generates:
test2(int, int): # @test2(int, int)
mov ecx, esi
cmp edi, esi
jge .LBB1_2
mov eax, edi
cdq
idiv ecx
mov ecx, eax
.LBB1_2:
mov eax, ecx
ret
While the register allocation is still weird here, it at least makes sense: if the branch is taken, the value of b
in ecx
is transfered to eax
as the return value. If it is not taken, the result of the division (in eax
) has to be transferred to ecx
to be in the same register as in the other case.
It could be that a __builtin_expect
convinces the compiler to special case the case where the branch is taken late in the compilatin process, orphaning the .LBB1_2
label and causing it to be ultimately absent from the assembly.
idiv esi
is 32-bit operand-size, so EAX is already zero-extended to fill RAX. Therefore copying to ESI or R8D and back has no effect on the value in EAX. (And the calling convention doesn't require zero-extension or sign-extension to 64-bit anyway; 32-bit types are returned in 32-bit registers with possible garbage in the upper 32.)
This looks like purely a missed optimization. (There's no microarchitectural performance reason that this would be a good thing either.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With