IEEE floating point vs custom float performance

Tags:

I'm working on a processor without a floating point unit so I have to use fixed or a custom floating point type for a user interface.

What does the performance on say a multiply look like for these three types:

IEEE Float (32)
Custom 32 bit float class with a 16 bit signed value and a signed 16 bit exponent
32-bit fixed decimal

I want something that will scale to a processor with a floating point unit as well, will the custom float be competitive performance-wise with an IEEE float? I've heard the performance of IEEE floats are terrible on processors without FPUs, is that because it has to do crazy and/oring due to the 24-bit value not being native? That is, will the custom float class mitigate that performance problem?

Any help would be greatly appreciated!

351

asked Jan 26 '13 08:01

Ryan Brown

1 Answers

Software-emulated IEEE floats/doubles are slow because of many edge cases one needs to check for and properly handle.

+/-infinity in input
Not-A-Number in input
+/-0 in input
normalized vs denormalized number in input and the implicit '1' in the mantissa
unpacking and packing
normalization/denormalization
under- and overflow checks
correct rounding, which can lead to extra (de)normalization and/or underflow/overflow

If you just roughly count the above as a number of primitive micro operations (1 for each item on the list), you get close to 10. There will be many more in the worst case.

So, if you're interested in IEEE-compilant floating point arithmetic, expect every emulated operation to be something like 30x slower than its integer counterpart (CodesInChaos's comment is timely with the 38 clocks per addition/multiplication).

You could cut some corners by choosing a floating-point format with:

just one zero
no Not-A-Number
normalized numbers only
no implicit '1' in the mantissa
exponent and mantissa each occupying an integral number of bytes
no or primitive rounding
possibly, no infinities
possibly, 2's complement mantissa
possibly, no exponent bias

Fixed-point arithmetic may turn out much more performant. But the usual problem with it is that you have to know all the ranges of the inputs and intermediate results beforehand so you can choose the right format in order to avoid overflows. You'll also likely need a number of different fixed-point formats supported, e.g. 16.16, 32.32, 8.24, 0.32. C++ templates may help reduce code duplication here.

In any event, the best you can do is define your problem, solve it with both floating and fixed point arithmetic, observe which of the two is the best for which CPU and choose the winner.

EDIT: For an example of a simpler floating-point format, take a look at the MIL-STD-1750A's 32-bit floating point format:

 MSB                                         LSB MSB          LSB
------------------------------------------------------------------
| S|                   Mantissa                 |    Exponent    |
------------------------------------------------------------------
  0  1                                        23 24            31

Floating point numbers are represented as a fractional mantissa times 2 raised to the power of the exponent. All floating point numbers are assumed normalized or floating point zero at the beginning of a floating point operation and the results of all floating point operations are normalized (a normalized floating point number has the sign of the mantissa and the next bit of opposite value) or floating point zero. A floating point zero is defined as 0000 0000₁₆, that is, a zero mantissa and a zero exponent (00₁₆). An extended floating point zero is defined as 0000 0000 0000₁₆, that is, a zero mantissa and a zero exponent. Some examples of the machine representation for 32-bit floating point numbers:

Decimal Number  Hexadecimal Notation  
(Mantissa x Exp)  
0.9999998 x 2¹²⁷     7FFFFF 7F  
0.5 x 2¹²⁷   400000 7F  
0.625 x 2⁴   500000 04  
0.5 x 2¹     400000 01  
0.5 x 2⁰     400000 00  
0.5 x 2^-1    400000 FF  
0.5 x 2^-128  400000 80  
0.0 x 2⁰     000000 00  
-1.0 x 2⁰    800000 00  
-0.5000001 x 2^-128   BFFFFF 80  
-0.7500001 x 2⁴  9FFFFF 04

168

answered Sep 19 '22 02:09

Alexey Frunze

Related questions
                            
                                Implicit copy constructor
                            
                                Template function lookup
                            
                                How to remove an arbitrary element from a standard heap in c++?
                            
                                How to clean initialized resources if exception thrown from constructor in c++
                            
                                parsing C ++ source code in java environment [closed]
                            
                                how can i avoid the compiler error: std::transform?
                            
                                How to install GCC 4.7.2 on Linux Mint? [closed]
                            
                                How to make a taskbar (system tray) application in Windows
                            
                                How to overload operator of a nested class?
                            
                                Error correction in names
                            
                                Boost Spirit QI slow
                            
                                Is there a way to do (A*B) mod M without overflow for unsigned long long A and B?
                            
                                Generating CMOV instructions using Microsoft compilers
                            
                                C++ Opengl - lighting using spotlight
                            
                                Inconsistent results of is_assignable<> [duplicate]
                            
                                How to use boost async_write with a vector of boost const_buffers correctly?
                            
                                Is knowledge about noexcept-ness supposed to be forwarded when passing around a function pointer?
                            
                                my c++ game architecture
                            
                                Concurrency problems wih Boost Property tree
                            
                                Diff utility that is C++ aware

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

IEEE floating point vs custom float performance

Tags:

c++

performance

floating-point

fixed-point

Ryan Brown

People also ask

1 Answers

Alexey Frunze

Recent Activity

Donate For Us