I would like my C function to efficiently compute the high 64 bits of the product of two 64 bit signed ints. I know how to do this in x86-64 assembly, with imulq and pulling the result out of %rdx. But I'm at a loss for how to write this in C at all, let alone coax the compiler to do it efficiently.
Does anyone have any suggestions for writing this in C? This is performance sensitive, so "manual methods" (like Russian Peasant, or bignum libraries) are out.
This dorky inline assembly function I wrote works and is roughly the codegen I'm after:
static long mull_hi(long inp1, long inp2) {
long output = -1;
__asm__("movq %[inp1], %%rax;"
"imulq %[inp2];"
"movq %%rdx, %[output];"
: [output] "=r" (output)
: [inp1] "r" (inp1), [inp2] "r" (inp2)
:"%rax", "%rdx");
return output;
}
If you're using a relatively recent GCC on x86_64:
int64_t mulHi(int64_t x, int64_t y) {
return (int64_t)((__int128_t)x*y >> 64);
}
At -O1 and higher, this compiles to what you want:
_mulHi:
0000000000000000 movq %rsi,%rax
0000000000000003 imulq %rdi
0000000000000006 movq %rdx,%rax
0000000000000009 ret
I believe that clang and VC++ also have support for the __int128_t type, so this should also work on those platforms, with the usual caveats about trying it yourself.
The general answer is that x * y
can be broken down into (a + b) * (c + d)
, where a
and c
are the high order parts.
First, expand to ac + ad + bc + bd
Now, you multiply the terms as 32 bit numbers stored as long long
(or better yet, uint64_t
), and you just remember that when you multiplied a higher order number, you need to scale by 32 bits. Then you do the adds, remembering to detect carry. Keep track of the sign. Naturally, you need to do the adds in pieces.
For code implementing the above, see my other answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With