complier generating a mov back and forth on eax

Question

int test1(int a, int b) {
    if (__builtin_expect(a < b, 0))
        return a / b;
    return b;
}

was compiled by clang with -O3 -march=native to

test1(int, int):                             # @test1(int, int)
        cmp     edi, esi
        jl      .LBB0_1
        mov     eax, esi
        ret
.LBB0_1:
        mov     eax, edi
        cdq
        idiv    esi
        mov     esi, eax
        mov     eax, esi  # moving eax back and forth
        ret

why eax is being moved back and forth after the idiv ?

gcc has a similar behavior so this seem to be intended.

gcc with -O3 -march=native complied the code to

test1(int, int):
        mov     r8d, esi
        cmp     edi, esi
        jl      .L4
        mov     eax, r8d
        ret
.L4:
        mov     eax, edi
        cdq
        idiv    esi
        mov     r8d, eax
        mov     eax, r8d  #back and forth mov
        ret

godbolt

fuz · Accepted Answer

This is not a complete solution to the puzzle but should give some clues.

Without the __builtin_expect, clang generates:

test2(int, int):                             # @test2(int, int)
        mov     ecx, esi
        cmp     edi, esi
        jge     .LBB1_2
        mov     eax, edi
        cdq
        idiv    ecx
        mov     ecx, eax
.LBB1_2:
        mov     eax, ecx
        ret

While the register allocation is still weird here, it at least makes sense: if the branch is taken, the value of b in ecx is transfered to eax as the return value. If it is not taken, the result of the division (in eax) has to be transferred to ecx to be in the same register as in the other case.

It could be that a __builtin_expect convinces the compiler to special case the case where the branch is taken late in the compilatin process, orphaning the .LBB1_2 label and causing it to be ultimately absent from the assembly.

Peter Cordes · Answer

idiv esi is 32-bit operand-size, so EAX is already zero-extended to fill RAX. Therefore copying to ESI or R8D and back has no effect on the value in EAX. (And the calling convention doesn't require zero-extension or sign-extension to 64-bit anyway; 32-bit types are returned in 32-bit registers with possible garbage in the upper 32.)

This looks like purely a missed optimization. (There's no microarchitectural performance reason that this would be a good thing either.)

complier generating a mov back and forth on eax

Tags:

c++

gcc

assembly

x86-64

micro-optimization

Tyker

2 Answers

fuz

Peter Cordes

Recent Activity

Donate For Us

complier generating a mov back and forth on eax

Tags:

c++

gcc

assembly

x86-64

micro-optimization

Tyker

2 Answers

fuz

Peter Cordes

Related questions

Recent Activity

Donate For Us