Modern CPU's can perform extended multiplication between two native-size words and store the low and high result in separate registers. Similarly, when performing division, they store the quotient and the remainder in two different registers instead of discarding the unwanted part.
Is there some sort of portable gcc intrinsic which would take the following signature:
void extmul(size_t a, size_t b, size_t *lo, size_t *hi);
Or something like that, and for division:
void extdiv(size_t a, size_t b, size_t *q, size_t *r);
I know I could do it myself with inline assembly and shoehorn portability into it by throwing #ifdef's in the code, or I could emulate the multiplication part using partial sums (which would be significantly slower) but I would like to avoid that for readability. Surely there exists some built-in function to do this?
For gcc since version 4.6 you can use __int128
. This works on most 64 bit hardware. For instance
To get the 128 bit result of a 64x64 bit multiplication just use
void extmul(size_t a, size_t b, size_t *lo, size_t *hi) {
__int128 result = (__int128)a * (__int128)b;
*lo = (size_t)result;
*hi = result >> 64;
}
On x86_64 gcc is smart enough to compile this to
0: 48 89 f8 mov %rdi,%rax
3: 49 89 d0 mov %rdx,%r8
6: 48 f7 e6 mul %rsi
9: 49 89 00 mov %rax,(%r8)
c: 48 89 11 mov %rdx,(%rcx)
f: c3 retq
No native 128 bit support or similar required, and after inlining only the mul
instruction remains.
Edit: On a 32 bit arch this works in a similar way, you need to replace __int128_t
by uint64_t
and the shift width by 32. The optimization will work on even older gccs.
For those wondering about the other half of the question (division), gcc does not provide an intrinsic for that because the processor division instructions don't conform to the standard.
This is true both with 128-bit dividends on 64-bit x86 targets and 64-bit dividends on 32-bit x86 targets. The problem is that DIV will cause divide overflow exceptions in cases where the standard says the result should be truncated. For example (unsigned long long) (((unsigned _int128) 1 << 64) / 1)
should evaluate to 0, but would cause divide overflow exception if evaluated with DIV.
(Thanks to @ross-ridge for this info)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With