Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I use floating point's NaN, or floating point + bool for a data set that contains invalid values?

Tags:

I have a large amount of data to process with math intensive operations on each data set. Much of it is analogous to image processing. However, since this data is read directly from a physical device, many of the pixel values can be invalid.

This makes NaN's property of representing values that are not a number and spreading on arithmetic operations very compelling. However, it also seems to require turning off some optimizations such as gcc's -ffast-math, plus we need to be cross platform. Our current design uses a simple struct that contains a float value and a bool indicating validity.

While it seems NaN was designed with this use in mind, others think it is more trouble than it is worth. Does anyone have advice based on their more intimate experience with IEEE754 with performance in mind?

like image 617
Andrew Hundt Avatar asked Mar 05 '12 21:03

Andrew Hundt


2 Answers

BRIEF: For strictest portability, don't use NaNs. Use a separate valid bit. E.g. a template like Valid. However, if you know that you will only ever run on IEEE 754-2008 machines, and not IEEE 754-1985 (see below), then you may get away with it.

For performance, it is probably faster not to use NaNs on most of the machines that you have access to. However, I have been involved with hardware design of FP on several machines that are improving NaN handling performance, so there is a trend to make NaNs faster, and, in particular, signalling NaNs should soon be faster than Valid.

DETAIL:

Not all floating point formats have NaNs. Not all systems use IEEE floating point. IBM hex floating point can still be found on some machines - actually systems, since IBM now supports IEEE FP on more recent machines.

Furthermore, IEEE Floating Point itself had compatibility issues wrt NaNs, in IEEE 754-1985. E.g, see wikipedia http://en.wikipedia.org/wiki/NaN:

The original IEEE 754 standard from 1985 (IEEE 754-1985) only described binary floating point formats, and did not specify how the signaled/quiet state was to be tagged. In practice, the most significant bit of the significand determined whether a NaN is signalling or quiet. Two different implementations, with reversed meanings, resulted. * most processors (including those of the Intel/AMD x86-32/x86-64 family, the Motorola 68000 family, the AIM PowerPC family, the ARM family, and the Sun SPARC family) set the signaled/quiet bit to non-zero if the NaN is quiet, and to zero if the NaN is signaling. Thus, on these processors, the bit represents an 'is_quiet' flag. * in NaNs generated by the PA-RISC and MIPS processors, the signaled/quiet bit is zero if the NaN is quiet, and non-zero if the NaN is signaling. Thus, on these processors, the bit represents an 'is_signaling' flag.

This, if your code may run on older HP machines, or current MIPS machines (which are ubiquitous in embedded systems), you should not depend on a fixed encoding of NaN, but should have a machine dependent #ifdef for your special NaNs.

IEEE 754-2008 standardizes NaN encodings, so this is getting better. It depends on your market.

As for performance: many machines essentially trap, or otherwise take a major hiccup in performance, when performing computations involving both SNaNs (which must trap) and QNaNs (which don't need to trap, i.e. which could be fast - and which are getting faster in some machines as we speak.)

I can say with confidence that on older machines, particularly older Intel machines, you did NOT want to use NaNs if you cared about performance. E.g. http://www.cygnus-software.com/papers/x86andinfinity.html says "The Intel Pentium 4 handles infinities, NANs, and denormals very badly. ... If you write code that adds floating point numbers at the rate of one per clock cycle, and then throw infinities at it as input, the performance drops. A lot. A huge amount. ... NANs are even slower. Addition with NANs takes about 930 cycles. ... Denormals are a bit trickier to measure."

Get the picture? Almost 1000x slower to use a NaN than to do a normal floating point operation? In this case it is almost guaranteed that using a template like Valid will be faster.

However, see the reference to "Pentium 4"? That's a really old web page. For years people like me have been saying "QNaNs should be faster", and it has slowly taken hold.

More recently (2009), Microsoft says http://connect.microsoft.com/VisualStudio/feedback/details/498934/big-performance-penalty-for-checking-for-nans-or-infinity "If you do math on arrays of double that contain large numbers of NaN's or Infinities, there is an order of magnitude performance penalty."

If I feel impelled, I may go and run a microbenchmark on some machines. But you should get the picture.

This should be changing because it is not that hard to make QNaNs fast. But it has always been a chicken and egg problem: hardware guys like those I work with say "Nobody uses NaNs, so we won;t make them fast", while software guys don't use NaNs because they are slow. Still, the tide is slowly changing.

Heck, if you are using gcc and want best performance, you turn on optimizations like "-ffinite-math-only ... Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs." Similar is true for most compilers.

By the way, you can google like I did, "NaN performance floating point" and check refs out yourself. And/or run your own microbenchmarks.

Finally, I have been assuming that you are using a template like

template<typename T> class Valid {     ...     bool valid;     T value;     ... }; 

I like templates like this, because they can bring "validity tracking" not just to FP, but also to integer (Valid), etc.

But, they can have a big cost. The operations are probably not much more expensive than NaN handling on old machines, but the data density can be really poor. sizeof(Valid) may sometimes be 2*sizeof(float). This bad density may hurt performance much more than the operations involved.

By the way, you should consider template specialization, so that Valid uses NaNs if they arte available and fast, and a valid bit otherwise.

template <> class Valid<float> {      float value;      bool is_valid() {          return value != my_special_NaN;      }  } 

etc.

Anyway, you are better off having as few valid bits as possible, and packing them elsewhere, rather than Valid right close to the value. E.g.

struct Point { float x, y, z; }; Valid<Point> pt; 

is better (density wise) than

struct Point_with_Valid_Coords { Valid<float> x, y, z; }; 

unless you are using NaNs - or some other special encoding.

And

struct Point_with_Valid_Coords { float x, y, z; bool valid_x, valid_y, valid_z }; 

is in between - but then you have to do all the code yourself.

BTW, I have been assuming you are using C++. If FORTRAN or Java ...

BOTTOM LINE: separate valid bits is probably faster and more portable.

But NaN handling is speeding up, and one day soon will be good enough

By the way, my preference: create a Valid template. Then you can use it for all data types. Specialize it for NaNs if it helps. Although my life is making things faster, IMHO it is usually more important to make the code clean.

like image 124
Krazy Glew Avatar answered Oct 28 '22 01:10

Krazy Glew


If invalid data is very common, you are of course wasting a lot of time on running this data through the processing. If the invalid data is common enough it is probably better to be running some kind of sparse datastructure of only the valid data. If it is not very common, you can of course keep a sparse datastructure of which data is invalid. That way you would not waste a bool for each value. But maybe memory is not a problem for you...

If you are doing operations such as multipling two possibly invalid data entries, I understand it is compelling to use NaNs instead of doing checks on both variables to see if they are valid and setting the same flag in the resultant.

How portable do you need to be? Will you ever need to be able to port it to an architecture with only fixed point support? If that is the case, I think your choice is clear.

Personally I would only use NaNs if it proved to be much faster. Otherwise I'd say the code gets more clear if you have explicit handling of invalid data.

like image 32
tombjo Avatar answered Oct 28 '22 03:10

tombjo