Floating point comparison precision

1 Answers

Suppose a is approximately 0.00000000000000001, b is approximately 0.00000000000000002, and c is 1. Then a − c and b − c will both equal −1.

(That's assuming double-precision, a.k.a. 64-bit, values. For higher-precision values, you'll need to add some more zeroes.)

Edited to add explanation:

If we ignore denormalized values and not-a-number values and infinities and so on, and just focus on IEEE 754 double-precision floating-point value for the sake of having something concrete to look at, then — in terms of the binary representation, a floating-point value consists of a sign bit s (0 for positive, 1 for negative), an eleven-bit exponent e (with an offset of 1023, such that e=0 means 2⁻¹⁰²³ and e=1023 means 2⁰, i.e. 1), and a 52-bit fixed-point significand m (representing 52 places past the binary point, so it ranges from [0,1) with finite precision). The actual value of the representation is therefore (−1)^s × (1 + m) × 2^e−1023.

Because the significand is fixed-point, and has a fixed number of bits, the precision is very finite. A value like 1.00000000000000001 and a value like 1.00000000000000002 are identical for very many places past the decimal — more places than a double-precision significand can hold.

When you perform addition or subtraction between a very large number and a very small number (relative to each other: in our example, 1 is "very large"; alternatively, we could have used 1 as the very small value and chosen a very large value of 10000000000000000), the resulting exponent is going to be determined almost entirely by the very large number, and the significand of the very small number has to get scaled appropriately. In our case, it gets divided by about 10¹⁷; so it simply disappears. The significand doesn't hold enough bits to be able to distinguish that.

114

answered Sep 29 '22 02:09

ruakh

Related questions
                            
                                C++, Union vs Class Inheritance
                            
                                Multiple classes in one .cpp file
                            
                                Const Correctness for Getter Function
                            
                                Unresolved external symbol - IdnToAscii
                            
                                Should I iterate a vector by iterator or by access operator?
                            
                                QSharedData and inheritance
                            
                                Swap the Boxes in minimum moves [closed]
                            
                                Failed to retrieve object property in WMI (c++)
                            
                                c++ virtual classes: interesting point
                            
                                Appending a string to __FUNCTION__ in a macro
                            
                                Why does protected inheritance cause dynamic_cast to fail?
                            
                                Is getting the thread ID expensive in terms of performance?
                            
                                In c++11, can a virtual function return a large value efficiently with move semantics?
                            
                                Is bool safe in a bitfield definition? [duplicate]
                            
                                glGenTextures segmentation fault?
                            
                                Loop until integer input is in required range fails to work with non-digit character inputs
                            
                                Parallel Iterators
                            
                                Logical Operators and their precedence in C/C++
                            
                                Why do I get a template error if I name my function `swap`, but `Swap` is okay?
                            
                                shared_ptr for a raw pointer argument

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Floating point comparison precision

Tags:

c++

c

floating-point

ieee-754

user16367

People also ask

1 Answers

ruakh

Recent Activity

Donate For Us