Here is code that adds two triples of unboxed Words representing a 192 bit number into a new triple of unboxed Words, and also returns any overflow:
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE UnboxedTuples #-}
import GHC.Prim(plusWord2#, Word#, or#)
longAdd ::
(# Word#, Word#, Word# #) ->
(# Word#, Word#, Word# #) ->
(# Word#, (# Word#, Word#, Word# #) #)
longAdd (# xl, xm, xh #) (# yl, ym, yh #) =
let
plusWord3 x y c =
let
(# c1, r1 #) = plusWord2# x y
(# c2, r2 #) = plusWord2# r1 c
in
(# plusWord# c1 c2, r2 #)
(# cl, rl #) = plusWord2# xl yl
(# cm, rm #) = plusWord3 xm ym cl
(# ch, rh #) = plusWord3 xh yh cm
in
(# ch, (# rl, rm, rh #) #)
The issue is the "plusWord3" definition. Ideally, this is just like an "adc" function, which takes two words and the carry bit and returns the result and a new carry, so the resulting assembly is like the following:
add x1 y1
adc x2 y2
adc x3 y3
Unfortunately GHC, whether native or via LLVM, produce ugly assembly code that involves saving the carry bit to a register and then reading it via a separate extra add, instead of just using adc
. I don't want to call an external C function to achieve this, as once you add the call overhead it's probably not worth it, I'd like to stay in Haskell so the code can be inlined where possible. But I also want to be able to coax the compiler into producing the adc
instruction appropriately. Is there anyway I can achieve that?
Most realiable and efficient way would be calling a primop directly in your program.
Using a FFI call is the easiest way but as you also noted it won't be the most efficient way, because of the FFI overheads.
Even if the compiler would support the instruction you want and use it in some programs, it would be fragile. Some seemingly innocent changes in your program may end up with different generated assembly that doesn't use the instruction you want.
So my proposal is:
I'm not familiar with low-level programming, but after question round on Freenode's #ghc channel, I got a pointer to addIntC#
primop, which is related to LLVM's llvm.sadd.with.overflow.
. I'm not sure what llvm compiles that into.
The native code gen of GHC seems to know about adc
instruction: X86/CodeGen.hs
. But as comment says:
we handle addition, but rather badly
Edit: you work with words. Seems that LLVM backend does compile MO_Add2
(which is another name for plusWord2
) to llvm.uadd.with.overflow
in https://github.com/ghc/ghc/blob/2b7d9c2b96eb9da3cce7826df4a91c3426095528/compiler/llvmGen/LlvmCodeGen/CodeGen.hs#L737 , related ticket: https://ghc.haskell.org/trac/ghc/ticket/9430
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With