Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Computing high 64 bits of a 64x64 int product in C

Tags:

c

math

64-bit

I would like my C function to efficiently compute the high 64 bits of the product of two 64 bit signed ints. I know how to do this in x86-64 assembly, with imulq and pulling the result out of %rdx. But I'm at a loss for how to write this in C at all, let alone coax the compiler to do it efficiently.

Does anyone have any suggestions for writing this in C? This is performance sensitive, so "manual methods" (like Russian Peasant, or bignum libraries) are out.

This dorky inline assembly function I wrote works and is roughly the codegen I'm after:

static long mull_hi(long inp1, long inp2) {
    long output = -1;
    __asm__("movq %[inp1], %%rax;"
            "imulq %[inp2];"
            "movq %%rdx, %[output];"
            : [output] "=r" (output)
            : [inp1] "r" (inp1), [inp2] "r" (inp2)
            :"%rax", "%rdx");
    return output;
}
like image 935
fish Avatar asked Oct 09 '09 01:10

fish


2 Answers

If you're using a relatively recent GCC on x86_64:

int64_t mulHi(int64_t x, int64_t y) {
    return (int64_t)((__int128_t)x*y >> 64);
}

At -O1 and higher, this compiles to what you want:

_mulHi:
0000000000000000    movq    %rsi,%rax
0000000000000003    imulq   %rdi
0000000000000006    movq    %rdx,%rax
0000000000000009    ret

I believe that clang and VC++ also have support for the __int128_t type, so this should also work on those platforms, with the usual caveats about trying it yourself.

like image 135
Stephen Canon Avatar answered Sep 19 '22 14:09

Stephen Canon


The general answer is that x * y can be broken down into (a + b) * (c + d), where a and c are the high order parts.

First, expand to ac + ad + bc + bd

Now, you multiply the terms as 32 bit numbers stored as long long (or better yet, uint64_t), and you just remember that when you multiplied a higher order number, you need to scale by 32 bits. Then you do the adds, remembering to detect carry. Keep track of the sign. Naturally, you need to do the adds in pieces.

For code implementing the above, see my other answer.

like image 43
DigitalRoss Avatar answered Sep 18 '22 14:09

DigitalRoss