I'm implementing a 64-bit fixed-point signed 31.32 numeric type in C#, based on <code>long</code>. So far so good for addition and substraction. Multiplication however has an annoying case I'm trying to solve. My current algorithm consist of splitting each operand into its most and least significant 32 bits, performing 4 multiplications into 4 longs and adding the relevant bits of these longs. Here it is in code: <pre class="prettyprint"><code>public static Fix64 operator *(Fix64 x, Fix64 y) { var xl = x.m_rawValue; // underlying long of x var yl = y.m_rawValue; // underlying long of y var xlow = xl & 0x00000000FFFFFFFF; // take the 32 lowest bits of x var xhigh = xl >> 32; // take the 32 highest bits of x var ylow = yl & 0x00000000FFFFFFFF; // take the 32 lowest bits of y var yhigh = yl >> 32; // take the 32 highest bits of y // perform multiplications var lowlow = xlow * ylow; var lowhigh = xlow * yhigh; var highlow = xhigh * ylow; var highhigh = xhigh * yhigh; // take the highest bits of lowlow and the lowest of highhigh var loResult = lowlow >> 32; var midResult1 = lowhigh; var midResult2 = highlow; var hiResult = highhigh << 32; // add everything together and build result var finalResult = loResult + midResult1 + midResult2 + hiResult; return new Fix64(finalResult); // this constructor just copies the parameter into m_rawValue } </code></pre> This works in the general case but fails in a number of scenarios. Namely, the result is off by 1.0 (decimal value), often for extremely small or large values of the operands. Here are some results from my unit tests (FromRaw() is a method that builds a Fix64 directly from a long value, without shifting it): <pre class="prettyprint"><code>Failed for FromRaw(-1) * FromRaw(-1): expected 0 but got -1 Failed for FromRaw(-4) * FromRaw(6791302811978701836): expected -1.4726290525868535041809082031 but got -2,4726290525868535041809082031 Failed for FromRaw(2265950765) * FromRaw(17179869183): expected 2.1103311001788824796676635742 but got 1,1103311001788824796676635742 </code></pre> I'm trying to work out the logic of this on paper but I'm a bit stuck. How can I fix this?

The algorithm looks sound, and it worked it out "on paper" and it seems right. Here are my worked out notes for <code>FromRaw(2265950765) * FromRaw(17179869183)</code> (0.52758277510292828083038330078125 * 3.99999999976716935634613037109375 = 2.11033110017888247966766357421875) <pre class="prettyprint"><code>x1 = 2265950765 y1 = 17179869183 xlow = 2265950765 xhigh = 0 ylow = 4294967295 yhigh = 3 lowlow = 9732184427755230675 lowhigh = 6797852295 highlow = 0 highhigh = 0 loResult = 2265950764 midResult1 = 6797852295 midResult2 = 0 hiResult = 0 finalResult = 9063803059 </code></pre> Now here's what I suspect is happening: <code>lowlow</code> needs to be a <code>ulong</code> for the result to come out right, but I think that what you're getting is a signed value. Interpreted as signed, <code>lowlow</code> ends up being -8714559645954320941 (too low by 2^64), <code>loResult</code> ends up being -2029016532 (too low by 2^32), <code>finalResult</code> ends up being <code>4768835763</code> (also too low by 2^32), and the resulting value is then 1.11033110017888247966766357421875 which is exactly 1 less than you expect. In general your values should be treated as having a signed "upper half" and an unsigned "lower half". <code>highhigh</code> is signed * signed = signed; <code>lowhigh</code> and <code>highlow</code> are signed * unsigned = signed; but <code>lowlow</code> is unsigned * unsigned = unsigned.

64-bit fixed-point multiplication error

I'm implementing a 64-bit fixed-point signed 31.32 numeric type in C#, based on long. So far so good for addition and substraction. Multiplication however has an annoying case I'm trying to solve.

My current algorithm consist of splitting each operand into its most and least significant 32 bits, performing 4 multiplications into 4 longs and adding the relevant bits of these longs. Here it is in code:

public static Fix64 operator *(Fix64 x, Fix64 y) {

    var xl = x.m_rawValue; // underlying long of x
    var yl = y.m_rawValue; // underlying long of y

    var xlow = xl & 0x00000000FFFFFFFF; // take the 32 lowest bits of x
    var xhigh = xl >> 32; // take the 32 highest bits of x
    var ylow = yl & 0x00000000FFFFFFFF; // take the 32 lowest bits of y
    var yhigh = yl >> 32; // take the 32 highest bits of y

    // perform multiplications
    var lowlow = xlow * ylow;
    var lowhigh = xlow * yhigh;
    var highlow = xhigh * ylow;
    var highhigh = xhigh * yhigh;

    // take the highest bits of lowlow and the lowest of highhigh
    var loResult = lowlow >> 32;
    var midResult1 = lowhigh;
    var midResult2 = highlow;
    var hiResult = highhigh << 32;

    // add everything together and build result
    var finalResult = loResult + midResult1 + midResult2 + hiResult;
    return new Fix64(finalResult); // this constructor just copies the parameter into m_rawValue
}

This works in the general case but fails in a number of scenarios. Namely, the result is off by 1.0 (decimal value), often for extremely small or large values of the operands. Here are some results from my unit tests (FromRaw() is a method that builds a Fix64 directly from a long value, without shifting it):

Failed for FromRaw(-1) * FromRaw(-1): expected 0 but got -1
Failed for FromRaw(-4) * FromRaw(6791302811978701836): expected -1.4726290525868535041809082031 but got -2,4726290525868535041809082031
Failed for FromRaw(2265950765) * FromRaw(17179869183): expected 2.1103311001788824796676635742 but got 1,1103311001788824796676635742

I'm trying to work out the logic of this on paper but I'm a bit stuck. How can I fix this?

How do you multiply a fixed-point number?

To perform fixed-point multiplication, we can first ignore the binary point of the multiplier and multiplicand, perform the multiplication treating the operands as two's complement numbers, and, then, determine the position of the binary point for the result.

How to divide fixed-point?

Division. To divide two fixed-point numbers, one takes the integer quotient of their underlying integers, and assumes that the scaling factor is the quotient of their scaling factors. In general, the first division requires rounding and therefore the result is not exact.

Is fixed-point faster than floating point?

Fixed point math, independent of processor speed, is easier to code with and faster than floating point math. Fixed point is adequate unless you know that you will be dealing with higher numbers than the fixed-point unit can handle.

How do you add a fixed-point number?

The addition of fixed-point numbers requires that the binary points of the addends be aligned. The addition is then performed using binary arithmetic so that no number other than 0 or 1 is used. The default global fimath has a value of 1 (true) for the CastBeforeSum property.

The algorithm looks sound, and it worked it out "on paper" and it seems right. Here are my worked out notes for FromRaw(2265950765) * FromRaw(17179869183) (0.52758277510292828083038330078125 * 3.99999999976716935634613037109375 = 2.11033110017888247966766357421875)

x1 = 2265950765
y1 = 17179869183

xlow = 2265950765
xhigh = 0
ylow = 4294967295
yhigh = 3

lowlow = 9732184427755230675
lowhigh = 6797852295
highlow = 0
highhigh = 0

loResult = 2265950764
midResult1 = 6797852295
midResult2 = 0
hiResult = 0

finalResult = 9063803059

Now here's what I suspect is happening: lowlow needs to be a ulong for the result to come out right, but I think that what you're getting is a signed value. Interpreted as signed, lowlow ends up being -8714559645954320941 (too low by 2^64), loResult ends up being -2029016532 (too low by 2^32), finalResult ends up being 4768835763 (also too low by 2^32), and the resulting value is then 1.11033110017888247966766357421875 which is exactly 1 less than you expect.

In general your values should be treated as having a signed "upper half" and an unsigned "lower half". highhigh is signed * signed = signed; lowhigh and highlow are signed * unsigned = signed; but lowlow is unsigned * unsigned = unsigned.

64-bit fixed-point multiplication error

Tags:

c#

math

binary

fixed-point

Asik

People also ask

1 Answers

hobbs

Recent Activity

Donate For Us

64-bit fixed-point multiplication error

Tags:

c#

math

binary

fixed-point

Asik

People also ask

1 Answers

hobbs

Related questions

Recent Activity

Donate For Us