Why perform multiplication in this way?

Question

I've run into this function:

static inline INT32 MPY48SR(INT16 o16, INT32 o32)
{
    UINT32   Temp0;
    INT32    Temp1;
    // A1. get the lower 16 bits of the 32-bit param
    // A2. multiply them with the 16-bit param
    // A3. add 16384 (TODO: why?)
    // A4. bitshift to the right by 15 (TODO: why 15?)
    Temp0 = (((UINT16)o32 * o16) + 0x4000) >> 15;
    // B1. Get the higher 16 bits of the 32-bit param
    // B2. Multiply them with the 16-bit param
    Temp1 = (INT16)(o32 >> 16) * o16;
    // 1. Shift B to the left (TODO: why do this?)
    // 2. Combine with A and return
    return (Temp1 << 1) + Temp0;
}

The inline comments are mine. It seems that all it's doing is multiplying the two arguments. Is this right, or is there more to it? Why would this be done in such a way?

Derek Ledbetter · Accepted Answer

Those parameters don't represent integers. They represent real numbers in fixed-point format with 15 bits to the right of the radix point. For instance, 1.0 is represented by 1 << 15 = 0x8000, 0.5 is 0x4000, -0.5 is 0xC000 (or 0xFFFFC000 in 32 bits).

Adding fixed-point numbers is simple, because you can just add their integer representation. But if you want to multiply, you first have to multiply them as integers, but then you have twice as many bits to the right of the radix point, so you have to discard the excess by shifting. For instance, if you want to multiply 0.5 by itself in 32-bit format, you multiply 0x00004000 (1 << 14) by itself to get 0x10000000 (1 << 28), then shift right by 15 bits to get 0x00002000 (1 << 13). To get better accuracy, when you discard the lowest 15-bits, you want to round to the nearest number, not round down. You can do this by adding 0x4000 = 1 << 14. Then if the discarded 15 bits is less than 0x4000, it gets rounded down, and if it's 0x4000 or more, it gets rounded up.

 (0x3FFF + 0x4000) >> 15 = 0x7FFF >> 15 = 0
 (0x4000 + 0x4000) >> 15 = 0x8000 >> 15 = 1

To sum up, you can do the multiplication like this:

 return (o32 * o16 + 0x4000) >> 15;

But there's a problem. In C++, the result of a multiplication has the same type as its operands. So o16 is promoted to the same size as o32, then they are multiplied to get a 32-bit result. But this throws away the top bits, because the product needs 16 + 32 = 48 bits for accurate representation. One way to do this is to cast the operands to 64 bits and then multiply, but that might be slower, and it's not supported on all machines. So instead it breaks o32 into two 16-bit pieces, then does two multiplications in 32-bits, and combines the results.

anatolyg · Answer

This implements multiplication of fixed-point numbers. The numbers are viewed as being in the Q15 format (having 15 bits in the fractional part).

Mathematically, this function calculates (o16 * o32) / 2^15, rounded to nearest integer (hence the 2^14 factor, which represents 1/2, added to a number in order to round it). It uses unsigned and signed 16-bit multiplications with 32-bit result, which are presumably supported by the instruction set.

Note that there exists a corner case, where each of the numbers has a minimal value (-2^15 and -2^31); in this case, the result (2^31) is not representable in the output, and gets wrapped over (becomes -2^31 instead). For all other combinations of o16 and o32, the result is correct.

Why perform multiplication in this way?

Tags:

c++

multiplication

fixed-point

mpenkov

2 Answers

Derek Ledbetter

anatolyg

Recent Activity

Donate For Us

Why perform multiplication in this way?

Tags:

c++

multiplication

fixed-point

mpenkov

2 Answers

Derek Ledbetter

anatolyg

Related questions

Recent Activity

Donate For Us