Given 3 IEEE-754 floats a, b, c that are not +/-INF and not NaN and a < b, is it safe to assume that a - c < b - c? Or, can you give an example when this is incorrect?
Comparing for equality Floating point math is not exact. Simple values like 0.1 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations or the precision of intermediates can change the result.
To compare two floating point values, we have to consider the precision in to the comparison. For example, if two numbers are 3.1428 and 3.1415, then they are same up to the precision 0.01, but after that, like 0.001 they are not same.
The precision of floating-point numbers is either single or double, based on the number of hexadecimal digits in the fraction. A small integer is a binary integer with a precision of 15 bits. The range of small integers is -32768 to +32767. A large integer is a binary integer with a precision of 31 bits.
Floating-point decimal values generally do not have an exact binary representation. This is a side effect of how the CPU represents floating point data. For this reason, you may experience some loss of precision, and some floating-point operations may produce unexpected results.
Suppose a is approximately 0.00000000000000001, b is approximately 0.00000000000000002, and c is 1. Then a − c and b − c will both equal −1.
(That's assuming double-precision, a.k.a. 64-bit, values. For higher-precision values, you'll need to add some more zeroes.)
Edited to add explanation:
If we ignore denormalized values and not-a-number values and infinities and so on, and just focus on IEEE 754 double-precision floating-point value for the sake of having something concrete to look at, then — in terms of the binary representation, a floating-point value consists of a sign bit s (0 for positive, 1 for negative), an eleven-bit exponent e (with an offset of 1023, such that e=0 means 2−1023 and e=1023 means 20, i.e. 1), and a 52-bit fixed-point significand m (representing 52 places past the binary point, so it ranges from [0,1) with finite precision). The actual value of the representation is therefore (−1)s × (1 + m) × 2e−1023.
Because the significand is fixed-point, and has a fixed number of bits, the precision is very finite. A value like 1.00000000000000001 and a value like 1.00000000000000002 are identical for very many places past the decimal — more places than a double-precision significand can hold.
When you perform addition or subtraction between a very large number and a very small number (relative to each other: in our example, 1 is "very large"; alternatively, we could have used 1 as the very small value and chosen a very large value of 10000000000000000), the resulting exponent is going to be determined almost entirely by the very large number, and the significand of the very small number has to get scaled appropriately. In our case, it gets divided by about 1017; so it simply disappears. The significand doesn't hold enough bits to be able to distinguish that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With