Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

complier generating a mov back and forth on eax

int test1(int a, int b) {
    if (__builtin_expect(a < b, 0))
        return a / b;
    return b;
}

was compiled by clang with -O3 -march=native to

test1(int, int):                             # @test1(int, int)
        cmp     edi, esi
        jl      .LBB0_1
        mov     eax, esi
        ret
.LBB0_1:
        mov     eax, edi
        cdq
        idiv    esi
        mov     esi, eax
        mov     eax, esi  # moving eax back and forth
        ret

why eax is being moved back and forth after the idiv ?

gcc has a similar behavior so this seem to be intended.

gcc with -O3 -march=native complied the code to

test1(int, int):
        mov     r8d, esi
        cmp     edi, esi
        jl      .L4
        mov     eax, r8d
        ret
.L4:
        mov     eax, edi
        cdq
        idiv    esi
        mov     r8d, eax
        mov     eax, r8d  #back and forth mov
        ret

godbolt

like image 268
Tyker Avatar asked Oct 17 '22 05:10

Tyker


2 Answers

This is not a complete solution to the puzzle but should give some clues.

Without the __builtin_expect, clang generates:

test2(int, int):                             # @test2(int, int)
        mov     ecx, esi
        cmp     edi, esi
        jge     .LBB1_2
        mov     eax, edi
        cdq
        idiv    ecx
        mov     ecx, eax
.LBB1_2:
        mov     eax, ecx
        ret

While the register allocation is still weird here, it at least makes sense: if the branch is taken, the value of b in ecx is transfered to eax as the return value. If it is not taken, the result of the division (in eax) has to be transferred to ecx to be in the same register as in the other case.

It could be that a __builtin_expect convinces the compiler to special case the case where the branch is taken late in the compilatin process, orphaning the .LBB1_2 label and causing it to be ultimately absent from the assembly.

like image 131
fuz Avatar answered Oct 20 '22 22:10

fuz


idiv esi is 32-bit operand-size, so EAX is already zero-extended to fill RAX. Therefore copying to ESI or R8D and back has no effect on the value in EAX. (And the calling convention doesn't require zero-extension or sign-extension to 64-bit anyway; 32-bit types are returned in 32-bit registers with possible garbage in the upper 32.)

This looks like purely a missed optimization. (There's no microarchitectural performance reason that this would be a good thing either.)

like image 26
Peter Cordes Avatar answered Oct 21 '22 00:10

Peter Cordes