I have written the following very simple code which I am experimenting with in godbolt's compiler explorer:
#include <cstdint>
uint64_t func(uint64_t num, uint64_t den)
{
return num / den;
}
GCC produces the following output, which I would expect:
func(unsigned long, unsigned long):
mov rax, rdi
xor edx, edx
div rsi
ret
However Clang 13.0.0 produces the following, involving shifts and a jump even:
func(unsigned long, unsigned long): # @func(unsigned long, unsigned long)
mov rax, rdi
mov rcx, rdi
or rcx, rsi
shr rcx, 32
je .LBB0_1
xor edx, edx
div rsi
ret
.LBB0_1:
xor edx, edx
div esi
ret
When using uint32_t, clang's output is once again "simple" and what I would expect.
It seems this might be some sort of optimization, since clang 10.0.1 produces the same output as GCC, however I cannot understand what is happening. Why is clang producing this longer assembly?
The assembly seems to be checking if either num
or den
is larger than 2**32 by shifting right by 32 bits and then checking whether the resulting number is 0.
Depending on the decision, a 64-bit division (div rsi
) or 32-bit division (div esi
) is performed.
Presumably this code is generated because the compiler writer thinks the additional checks and potential branch outweigh the costs of doing an unnecessary 64-bit division.
If I understand correctly, it just checks if any of the operands is larger than 32-bits and uses different div for "up to" 32 bits and for larger one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With