How does DBL_MAX addition work?

Question

Code

#include<stdio.h>
#include<limits.h>
#include<float.h>

int f( double x, double y, double z){
  return  (x+y)+z == x+(y+z);
}

int ff( long long x, long long y, long long z){
  return  (x+y)+z == x+(y+z);
}

int main()
{
    printf("%d
",f(DBL_MAX,DBL_MAX,-DBL_MAX));     
    printf("%d
",ff(LLONG_MAX,LLONG_MAX,-LLONG_MAX));
    return 0;
}

Output

0
1

I am unable to understand why both functions work differently. What is happening here?

Baum mit Augen · Accepted Answer

In the eyes of the C++ and the C standard, the integer version definitely and the floating point version potentially invoke Undefined Behavior because the results of the computation x + y is not representable in the type the arithmetic is performed on.^† So both functions may yield or even do anything.

However, many real world platforms offer additional guarantees for floating point operations and implement integers in a certain way that lets us explain the results you get.

Considering f, we note that many popular platforms implement floating point math as described in IEEE 754. Following the rules of that standard, we get for the LHS:

DBL_MAX + DBL_MAX = INF

and

INF - DBL_MAX = INF.

The RHS yields

DBL_MAX - DBL_MAX = 0

and

DBL_MAX + 0 = DBL_MAX

and thus LHS != RHS.

Moving on to ff: Many platforms perform signed integer computation in twos complement. Twos complement's addition is associative, so the comparison will yield true as long as optimizer does not change it to something that contradicts twos complement rules.

The latter is entirely possible (for example see this discussion), so you cannot rely on signed integer overflow doing what I explained above. However, it seems that it "was nice" in this case.

^†Note that this never applies to unsigned integer arithmetic. In C++, unsigned integers implement arithmetic modulo 2^NumBits where NumBits is the number of bits of the type. In this arithmetic, every integer can be represented by picking a representative of its equivalence class in [0, 2^NumBits - 1]. So this arithmetic can never overflow.

For those doubting that the floating point case is potential UB: N4140 5/4 [expr] says

If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined.

which is the case. The inf and NaN stuff is allowed, but not required in C++ and C floating point math. It is only required if std::numeric_limits::is_iec559<T> is true for floating point type in question. (Or in C, if it defines __STDC_IEC_559__ . Otherwise, the Annex F stuff need not apply.) If either of the iec indicators guarantees us IEEE semantics, the behavior is well defined to do what I described above.

How does DBL_MAX addition work?

Tags:

c++

c

double

dazzieta

1 Answers

Baum mit Augen

Recent Activity

Donate For Us

How does DBL_MAX addition work?

Tags:

c++

c

double

dazzieta

1 Answers

Baum mit Augen

Related questions

Recent Activity

Donate For Us