Pure high-bit multiplication in assembly?

Question

To implement real numbers between 0 and 1, one usually uses ANSI floats or doubles. But fixed precision numbers between 0 and 1 (decimals modulo 1) can be efficiently implemented as 32 bit integers or 16 bit words, which add like normal integers/words, but which multiply the "wrong way", meaning that when you multiply X times Y, you keep the high order bits of the product. This is equivalent to multiplying 0.X and 0.Y, where all the bits of X are behind the decimal point. Likewise, signed numbers between -1 and 1 are also implementable this way with one extra bit and a shift.

How would one implement fixed-precision mod 1 or mod 2 in C (especially using MMX or SSE)? I think this representation could be useful for efficient representation of unitary matrices, for numerically intensive physics simulations. It makes for more MMX/SSE to have integer quantities, but you need higher level access to PMULHW.

Gunther Piez · Accepted Answer

If 16 bit fixed point arithmetic is sufficient and you are on x86 or a similar architecture, you can directly use SSE.

The SSE3 instruction pmulhrsw directly implements signed 0.15 fixed point arithmetic multiplication (mod 2 as you call it, from -1..+1) in hardware. Addition is not different than the standard 16 bit vector operations, just using paddw.

So a library which handles multiplication and addition of eight signed 16 bit fixed point variables at a time could look like this:

typedef __v8hi fixed16_t;

fixed16_t mul(fixed16_t a, fixed16_t b) {
    return _mm_mulhrs_epi16(a,b);
}

fixed16_t add(fixed16_t a, fixed16_t b) {
    return _mm_add_epi16(a,b);
}

Permission granted to use it in any way you like ;-)

Pure high-bit multiplication in assembly?

Tags:

c

x86

assembly

Ron Maimon

1 Answers

Gunther Piez

Recent Activity

Donate For Us

Pure high-bit multiplication in assembly?

Tags:

c

x86

assembly

Ron Maimon

1 Answers

Gunther Piez

Related questions

Recent Activity

Donate For Us