Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

IEEE Std 754 Floating-Point: let t := a - b, does the standard guarantee that a == b + t?

Assume that t,a,b are all double (IEEE Std 754) variables, and both values of a, b are NOT NaN (but may be Inf). After t = a - b, do I necessarily have a == b + t?

like image 558
updogliu Avatar asked May 29 '12 00:05

updogliu


People also ask

What is the IEEE 754 standard for floating-point representation?

The IEEE-754 standard describes floating-point formats, a way to represent real numbers in hardware. There are at least five internal formats for floating-point numbers that are representable in hardware targeted by the MSVC compiler. The compiler only uses two of them.

What are the three main components of a IEEE 754 floating point standard?

Storage Layout. IEEE floating point numbers have three basic components: the sign, the exponent, and the mantissa.

Can any real number be represented using the IEEE 754 standard?

No, not all, but there exists a range within which you can represent all integers accurately.

What number of digits that can be accurately stored in a float based on the IEEE standard 754 )?

To convert it into a binary fraction, multiply the fraction by 2, take the integer part and repeat with the new fraction by 2 until a fraction of zero is found or until the precision limit is reached which is 23 fraction digits for IEEE 754 binary32 format.


1 Answers

Absolutely not. One obvious case is a=DBL_MAX, b=-DBL_MAX. Then t=INFINITY, so b+t is also INFINITY.

What may be more surprising is that there are cases where this happens without any overflow. Basically, they're all of the form where a-b is inexact. For example, if a is DBL_EPSILON/4 and b is -1, a-b is 1 (assuming default rounding mode), and a-b+b is then 0.

The reason I mention this second example is that this is the canonical way of forcing rounding to a particular precision in IEEE arithmetic. For instance, if you have a number in the range [0,1) and want to force rounding it to 4 bits of precision, you would add and then subtract 0x1p49.

like image 129
R.. GitHub STOP HELPING ICE Avatar answered Oct 13 '22 16:10

R.. GitHub STOP HELPING ICE