Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible in floating point to return 0.0 subtracting two different values?

Due to the floating point "approx" nature, its possible that two different sets of values return the same value.

Example:

#include <iostream>  int main() {     std::cout.precision(100);      double a = 0.5;     double b = 0.5;     double c = 0.49999999999999994;      std::cout << a + b << std::endl; // output "exact" 1.0     std::cout << a + c << std::endl; // output "exact" 1.0 } 

But is it also possible with subtraction? I mean: is there two sets of different values (keeping one value of them) that return 0.0?

i.e. a - b = 0.0 and a - c = 0.0, given some sets of a,b and a,c with b != c??

like image 373
markzzz Avatar asked Feb 05 '19 09:02

markzzz


People also ask

Is 0.0 a floating point number?

'' If there is no minus zero, then 0.0 and -0.0 are both interpreted as simply a floating-point zero.

What is floating point addition and subtraction?

The major steps for a floating point addition and subtraction are. Extract the sign of the result from the two sign bits. Subtract the two exponents and . Find the absolute value of the exponent difference ( ) and choose the exponent of the greater number.

How do you subtract two float values in Python?

To subtract two floating numbers in Python, use the subtract operator(-). Float is one of the most used numeric data types in Python.


2 Answers

The IEEE-754 standard was deliberately designed so that subtracting two values produces zero if and only if the two values are equal, except that subtracting an infinity from itself produces NaN and/or an exception.

Unfortunately, C++ does not require conformance to IEEE-754, and many C++ implementations use some features of IEEE-754 but do not fully conform.

A not uncommon behavior is to “flush” subnormal results to zero. This is part of a hardware design to avoid the burden of handling subnormal results correctly. If this behavior is in effect, the subtraction of two very small but different numbers can yield zero. (The numbers would have to be near the bottom of the normal range, having some significand bits in the subnormal range.)

Sometimes systems with this behavior may offer a way of disabling it.

Another behavior to beware of is that C++ does not require floating-point operations to be carried out precisely as written. It allows “excess precision” to be used in intermediate operations and “contractions” of some expressions. For example, a*b - c*d may be computed by using one operation that multiplies a and b and then another that multiplies c and d and subtracts the result from the previously computed a*b. This latter operation acts as if c*d were computed with infinite precision rather than rounded to the nominal floating-point format. In this case, a*b - c*d may produce a non-zero result even though a*b == c*d evaluates to true.

Some C++ implementations offer ways to disable or limit such behavior.

like image 128
Eric Postpischil Avatar answered Sep 22 '22 23:09

Eric Postpischil


Gradual underflow feature of IEEE floating point standard prevents this. Gradual underflow is achieved by subnormal (denormal) numbers, which are spaced evenly (as opposed to logarithmically, like normal floating point) and located between the smallest negative and positive normal numbers with zeroes in the middle. As they are evenly spaced, the addition of two subnormal numbers of differing signedness (i.e. subtraction towards zero) is exact and therefore won't reproduce what you ask. The smallest subnormal is (much) less than the smallest distance between normal numbers, and therefore any subtraction between unequal normal numbers is going to be closer to a subnormal than zero.

If you disable IEEE conformance using a special denormals-are-zero (DAZ) or flush-to-zero (FTZ) mode of the CPU, then indeed you could subtract two small, close numbers which would otherwise result in a subnormal number, which would be treated as zero due to the mode of the CPU. A working example (Linux):

_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);    // system specific double d = std::numeric_limits<double>::min(); // smallest normal double n = std::nextafter(d, 10.0);     // second smallest normal double z = d - n;       // a negative subnormal (flushed to zero) std::cout << (z == 0) << '\n' << (d == n); 

This should print

1 0 

First 1 indicates that result of subtraction is exactly zero, while the second 0 indicates that the operands are not equal.

like image 30
eerorika Avatar answered Sep 21 '22 23:09

eerorika