Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is floating-point math consistent in C#? Can it be?

People also ask

How accurate are floating point numbers?

The data type float has 24 bits of precision. This is equivalent to only about 7 decimal places. (The rest of the 32 bits are used for the sign and size of the number.) The number of places of precision for float is the same no matter what the size of the number.

Why is floating point arithmetic inaccurate?

Because often-times, they are approximating rationals that cannot be represented finitely in base 2 (the digits repeat), and in general they are approximating real (possibly irrational) numbers which may not be representable in finitely many digits in any base.

Is floating point representation unique?

Representation of floating point number is not unique. For example, the number 55.66 can be represented as 5.566×10^1 , 0.5566×10^2 , 0.05566×10^3 , and so on. The fractional part can be normalized. In the normalized form, there is only a single non-zero digit before the radix point.

What are the limitations of floating point representation?

As a result, they do not represent all of the same values, are not binary compatible, and have different associated error rates. Because of a lack of guarantees on the specifics of the underlying floating-point system, no assumptions can be made about either precision or range.


I know of no way to way to make normal floating points deterministic in .net. The JITter is allowed to create code that behaves differently on different platforms(or between different versions of .net). So using normal floats in deterministic .net code is not possible.

The workarounds I considered:

  1. Implement FixedPoint32 in C#. While this is not too hard(I have a half finished implementation) the very small range of values makes it annoying to use. You have to be careful at all times so you neither overflow, nor lose too much precision. In the end I found this not easier than using integers directly.
  2. Implement FixedPoint64 in C#. I found this rather hard to do. For some operations intermediate integers of 128bit would be useful. But .net doesn't offer such a type.
  3. Implement a custom 32 bit floatingpoint. The lack of a BitScanReverse intrinsic causes a few annoyances when implementing this. But currently I think this is the most promising path.
  4. Use native code for the math operations. Incurs the overhead of a delegate call on every math operation.

I've just started a software implementation of 32 bit floating point math. It can do about 70million additions/multiplications per second on my 2.66GHz i3. https://github.com/CodesInChaos/SoftFloat . Obviously it's still very incomplete and buggy.


The C# specification (§4.1.6 Floating point types) specifically allows floating point computations to be done using precision higher than that of the result. So, no, I don't think you can make those calculations deterministic directly in .Net. Others suggested various workarounds, so you could try them.


The following page may be useful in the case where you need absolute portability of such operations. It discusses software for testing implementations of the IEEE 754 standard, including software for emulating floating point operations. Most information is probably specific to C or C++, however.

http://www.math.utah.edu/~beebe/software/ieee/

A note on fixed point

Binary fixed point numbers can also work well as a substitute for floating point, as is evident from the four basic arithmetic operations:

  • Addition and subtraction are trivial. They work the same way as integers. Just add or subtract!
  • To multiply two fixed point numbers, multiply the two numbers then shift right the defined number of fractional bits.
  • To divide two fixed point numbers, shift the dividend left the defined number of fractional bits, then divide by the divisor.
  • Chapter four of Hattangady (2007) has additional guidance on implementing binary fixed point numbers (S.K. Hattangady, "Development of a Block Floating Point Interval ALU for DSP and Control Applications", Master's thesis, North Carolina State University, 2007).

Binary fixed point numbers can be implemented on any integer data type such as int, long, and BigInteger, and the non-CLS-compliant types uint and ulong.

As suggested in another answer, you can use lookup tables, where each element in the table is a binary fixed point number, to help implement complex functions such as sine, cosine, square root, and so on. If the lookup table is less granular than the fixed point number, it is suggested to round the input by adding one half of the granularity of the lookup table to the input:

// Assume each number has a 12 bit fractional part. (1/4096)
// Each entry in the lookup table corresponds to a fixed point number
//  with an 8-bit fractional part (1/256)
input+=(1<<3); // Add 2^3 for rounding purposes
input>>=4; // Shift right by 4 (to get 8-bit fractional part)
// --- clamp or restrict input here --
// Look up value.
return lookupTable[input];

Is this a problem for C#?

Yes. Different architectures are the least of your worries, different framerates etc. can lead to deviations due to inaccuracies in float representations - even if they are the same inaccuracies (e.g. same architecture, except a slower GPU on one machine).

Can I use System.Decimal?

There is no reason you can't, however it's dog slow.

Is there a way to force my program to run in double precision?

Yes. Host the CLR runtime yourself; and compile in all the nessecary calls/flags (that change the behaviour of floating point arithmetic) into the C++ application before calling CorBindToRuntimeEx.

Are there any libraries that would help keep floating point calculations consistent?

Not that I know of.

Is there another way to solve this?

I have tackled this problem before, the idea is to use QNumbers. They are a form of reals that are fixed-point; but not fixed point in base-10 (decimal) - rather base-2 (binary); because of this the mathematical primitives on them (add, sub, mul, div) are much faster than the naive base-10 fixed points; especially if n is the same for both values (which in your case it would be). Furthermore because they are integral they have well-defined results on every platform.

Keep in mind that framerate can still affect these, but it is not as bad and is easily rectified using syncronisation points.

Can I use more mathematical functions with QNumbers?

Yes, round-trip a decimal to do this. Furthermore, you should really be using lookup tables for the trig (sin, cos) functions; as those can really give different results on different platforms - and if you code them correctly they can use QNumbers directly.


According to this slightly old MSDN blog entry the JIT will not use SSE/SSE2 for floating point, it's all x87. Because of that, as you mentioned you have to worry about modes and flags, and in C# that's not possible to control. So using normal floating point operations will not guarantee the exact same result on every machine for your program.

To get precise reproducibility of double precision you are going to have to do software floating point (or fixed point) emulation. I don't know of C# libraries to do this.

Depending on the operations you need, you might be able to get away with single precision. Here's the idea:

  • store all values you care about in single precision
  • to perform an operation:
    • expand inputs to double precision
    • do operation in double precision
    • convert result back to single precision

The big issue with x87 is that calculations might be done in 53-bit or 64-bit accuracy depending on the precision flag and whether the register spilled to memory. But for many operations, performing the operation in high precision and rounding back to lower precision will guarantee the correct answer, which implies that the answer will be guaranteed to be the same on all systems. Whether you get the extra precision won't matter, since you have enough precision to guarantee the right answer in either case.

Operations that should work in this scheme: addition, subtraction, multiplication, division, sqrt. Things like sin, exp, etc. won't work (results will usually match but there is no guarantee). "When is double rounding innocuous?" ACM Reference (paid reg. req.)

Hope this helps!


As already stated by other answers: Yes, this is a problem in C# - even when staying pure Windows.

As for a solution: You can reduce (and with some effort/performance hit) avoid the problem completely if you use built-in BigInteger class and scaling all calculations to a defined precision by using a common denominator for any calculation/storage of such numbers.

As requested by OP - regarding performance:

System.Decimal represents number with 1 bit for a sign and 96 bit Integer and a "scale" (representing where the decimal point is). For all calculations you make it must operate on this data structure and can't use any floating point instructions built into the CPU.

The BigInteger "solution" does something similar - only that you can define how much digits you need/want... perhaps you want only 80 bits or 240 bits of precision.

The slowness comes always from having to simulate all operations on these number via integer-only instructions without using the CPU/FPU-built-in instructions which in turn leads to much more instructions per mathematical operation.

To lessen the performance hit there are several strategies - like QNumbers (see answer from Jonathan Dickinson - Is floating-point math consistent in C#? Can it be?) and/or caching (for example trig calculations...) etc.


Well, here would be my first attempt on how to do this:

  1. Create an ATL.dll project that has a simple object in it to be used for your critical floating point operations. make sure to compile it with flags that disable using any non xx87 hardware to do floating point.
  2. Create functions that call floating point operations and return the results; start simple and then if it's working for you, you can always increase the complexity to meet your performance needs later if necessary.
  3. Put the control_fp calls around the actual math to ensure that it's done the same way on all machines.
  4. Reference your new library and test to make sure it works as expected.

(I believe you can just compile to a 32-bit .dll and then use it with either x86 or AnyCpu [or likely only targeting x86 on a 64-bit system; see comment below].)

Then, assuming it works, should you want to use Mono I imagine you should be able to replicate the library on other x86 platforms in a similar manner (not COM of course; although, perhaps, with wine? a little out of my area once we go there though...).

Assuming you can make it work, you should be able to set up custom functions that can do multiple operations at once to fix any performance issues, and you'll have floating point math that allows you to have consistent results across platforms with a minimal amount of code written in C++, and leaving the rest of your code in C#.