Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write portable floating point arithmetic in c++?

Say you're writing a C++ application doing lots of floating point arithmetic. Say this application needs to be portable accross a reasonable range of hardware and OS platforms (say 32 and 64 bits hardware, Windows and Linux both in 32 and 64 bits flavors...).

How would you make sure that your floating point arithmetic is the same on all platforms ? For instance, how to be sure that a 32 bits floating point value will really be 32 bits on all platforms ?

For integers we have stdint.h but there doesn't seem to exist a floating point equivalent.


[EDIT]

I got very interesting answers but I'd like to add some precision to the question.

For integers, I can write:

#include <stdint>
[...]
int32_t myInt;

and be sure that whatever the (C99 compatible) platform I'm on, myInt is a 32 bits integer.

If I write:

double myDouble;
float myFloat;

am I certain that this will compile to, respectively, 64 bits and 32 bits floating point numbers on all platforms ?

like image 990
Stéphane Bonniez Avatar asked Jun 11 '09 17:06

Stéphane Bonniez


People also ask

How do you represent a floating-point in C?

Any number that has a decimal point in it will be interpreted by the compiler as a floating-point number. Note that you have to put at least one digit after the decimal point: 2.0, 3.75, -12.6112. You can specific a floating point number in scientific notation using e for the exponent: 6.022e23.

How do you perform arithmetic with floating-point numbers?

Arithmetic operations on floating point numbers consist of addition, subtraction, multiplication and division. The operations are done with algorithms similar to those used on sign magnitude integers (because of the similarity of representation) — example, only add numbers of the same sign.

What is floating point arithmetic in coding?

In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base.

Is C float 32 or 64?

float is a 32-bit IEEE 754 single precision Floating Point Number – 1 bit for the sign, 8 bits for the exponent, and 23* for the value. float has 7 decimal digits of precision.


2 Answers

Non-IEEE 754

Generally, you cannot. There's always a trade-off between consistency and performance, and C++ hands that to you.

For platforms that don't have floating point operations (like embedded and signal processing processors), you cannot use C++ "native" floating point operations, at least not portably so. While a software layer would be possible, that's certainly not feasible for this type of devices.

For these, you could use 16 bit or 32 bit fixed point arithmetic (but you might even discover that long is supported only rudimentary - and frequently, div is very expensive). However, this will be much slower than built-in fixed-point arithmetic, and becomes painful after the basic four operations.

I haven't come across devices that support floating point in a different format than IEEE 754. From my experience, your best bet is to hope for the standard, because otherwise you usually end up building algorithms and code around the capabilities of the device. When sin(x) suddenly costs 1000 times as much, you better pick an algorithm that doesn't need it.

IEEE 754 - Consistency

The only non-portability I found here is when you expect bit-identical results across platforms. The biggest influence is the optimizer. Again, you can trade accuracy and speed for consistency. Most compilers have a option for that - e.g. "floating point consistency" in Visual C++. But note that this is always accuracy beyond the guarantees of the standard.

Why results become inconsistent? First, FPU registers often have higher resolution than double's (e.g. 80 bit), so as long as the code generator doesn't store the value back, intermediate values are held with higher accuracy.

Second, the equivalences like a*(b+c) = a*b + a*c are not exact due to the limited precision. Nonetheless the optimizer, if allowed, may make use of them.

Also - what I learned the hard way - printing and parsing functions are not necessarily consistent across platforms, probably due to numeric inaccuracies, too.

float

It is a common misconception that float operations are intrinsically faster than double. working on large float arrays is faster usually through less cache misses alone.

Be careful with float accuracy. it can be "good enough" for a long time, but I've often seen it fail faster than expected. Float-based FFT's can be much faster due to SIMD support, but generate notable artefacts quite early for audio processing.

like image 169
peterchen Avatar answered Sep 27 '22 17:09

peterchen


Use fixed point.

However, if you want to approach the realm of possibly making portable floating point operations, you at least need to use controlfp to ensure consistent FPU behavior as well as ensuring that the compiler enforces ANSI conformance with respect to floating point operations. Why ANSI? Because it's a standard.

And even then you aren't guaranteeing that you can generate identical floating point behavior; that also depends on the CPU/FPU you are running on.

like image 36
MSN Avatar answered Sep 27 '22 15:09

MSN