is assigning two doubles guaranteed to yield the same bitset patterns?

Question

There are several posts here about floating point numbers and their nature. It is clear that comparing floats and doubles must always be done cautiously. Asking for equality has also been discussed and the recommendation is clearly to stay away from it.

But what if there is a direct assignement:

double a = 5.4;
double b = a;

assumg a is any non-NaN value - can a == b ever be false?

It seems that the answer is obviously no, yet I can't find any standard defining this behaviour in a C++ environment. IEEE-754 states that two floating point numbers with equal (non-NaN) bitset patterns are equal. Does it now mean that I can continue comparing my doubles this way without having to worry about maintainability? Do I have to worried about other compilers / operating systems and their implementation regarding these lines? Or maybe a compiler that optimizes some bits away and ruins their equality?

I wrote a little program that generates and compares non-NaN random doubles forever - until it finds a case where a == b yields false. Can I compile/run this code anywhere and anytime in the future without having to expect a halt? (ignoring endianness and assuming sign, exponent and mantissa bit sizes / positions stay the same).

#include <iostream>
#include <random>

struct double_content {
    std::uint64_t mantissa : 52;
    std::uint64_t exponent : 11;
    std::uint64_t sign : 1;
};
static_assert(sizeof(double) == sizeof(double_content), "must be equal");


void set_double(double& n, std::uint64_t sign, std::uint64_t exponent, std::uint64_t mantissa) {
    double_content convert;
    memcpy(&convert, &n, sizeof(double));
    convert.sign = sign;
    convert.exponent = exponent;
    convert.mantissa = mantissa;
    memcpy(&n, &convert, sizeof(double_content));
}

void print_double(double& n) {
    double_content convert;
    memcpy(&convert, &n, sizeof(double));
    std::cout << "sign: " << convert.sign << ", exponent: " << convert.exponent << ", mantissa: " << convert.mantissa << " --- " << n << '
';
}

int main() {
    std::random_device rd;
    std::mt19937_64 engine(rd());
    std::uniform_int_distribution<std::uint64_t> mantissa_distribution(0ull, (1ull << 52) - 1);
    std::uniform_int_distribution<std::uint64_t> exponent_distribution(0ull, (1ull << 11) - 1);
    std::uniform_int_distribution<std::uint64_t> sign_distribution(0ull, 1ull);

    double a = 0.0;
    double b = 0.0;

    bool found = false;

    while (!found){
        auto sign = sign_distribution(engine);
        auto exponent = exponent_distribution(engine);
        auto mantissa = mantissa_distribution(engine);

        //re-assign exponent for NaN cases
        if (mantissa) {
            while (exponent == (1ull << 11) - 1) {
                exponent = exponent_distribution(engine);
            }
        }
        //force -0.0 to be 0.0
        if (mantissa == 0u && exponent == 0u) {
            sign = 0u;
        }


        set_double(a, sign, exponent, mantissa);
        b = a;

        //here could be more (unmodifying) code to delay the next comparison

        if (b != a) { //not equal!
            print_double(a);
            print_double(b);
            found = true;
        }
    }
}

using Visual Studio Community 2017 Version 15.9.5

Max Langhof · Accepted Answer

The C++ standard clearly specifies in [basic.types]#3:

For any trivially copyable type T, if two pointers to T point to distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a potentially-overlapping subobject, if the underlying bytes ([intro.memory]) making up obj1 are copied into obj2, obj2 shall subsequently hold the same value as obj1.

It gives this example:

T* t1p;
T* t2p;
// provided that t2p points to an initialized object ...
std::memcpy(t1p, t2p, sizeof(T));
// at this point, every subobject of trivially copyable type in *t1p contains
// the same value as the corresponding subobject in *t2p

The remaining question is what a value is. We find in [basic.fundamental]#12 (emphasis mine):

There are three floating-point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined.

Since the C++ standard has no further requirements on how floating point values are represented, this is all you will find as guarantee from the standard, as assignment is only required to preserve values ([expr.ass]#2):

In simple assignment (=), the object referred to by the left operand is modified by replacing its value with the result of the right operand.

As you correctly observed, IEEE-754 requires that non-NaN, non-zero floats compare equal if and only if they have the same bit pattern. So if your compiler uses IEEE-754-compliant floats, you should find that assignment of non-NaN, non-zero floating point numbers preserves bit patterns.

And indeed, your code

double a = 5.4;
double b = a;

should never allow (a == b) to return false. But as soon as you replace 5.4 with a more complicated expression, most of this nicety vanishes. It's not the exact subject of the article, but https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/ mentions several possible ways in which innocent looking code can yield different results (which breaks "identical to the bit pattern" assertions). In particular, you might be comparing an 80 bit intermediate result with a 64 bit rounded result, possibly yielding inequality.

Eric Postpischil · Answer

There are some complications here. First, note that the title asks a different question than the question. The title asks:

is assigning two doubles guaranteed to yield the same bitset patterns?

while the question asks:

can a == b ever be false?

The first of these asks whether different bits might occur from an assignment (which could be due to either the assignment not recording the same value as its right operand or due to the assignment using a different bit pattern that represents the same value), while the second asks whether, whatever bits are written by an assignment, the stored value must compare equal to the operand.

In full generality, the answer to the first question is no. Using IEEE-754 binary floating-point formats, there is a one-to-one map between non-zero numeric values and their encodings in bit patterns. However, this admits several cases where an assignment could produce a different bit pattern:

The right operand is the IEEE-754 −0 entity, but +0 is stored. This is not a proper IEEE-754 operation, but C++ is not required to conform to IEEE 754. Both −0 and +0 represent mathematical zero and would satisfy C++ requirements for assignment, so a C++ implementation could do this.
IEEE-754 decimal formats have one-to-many maps between numeric values and their encodings. By way of illustration, three hundred could be represented with bits whose direct meaning is 3•10² or bits whose direct meaning is 300•10⁰. Again, since these represent the same mathematical value, it would be permissible under the C++ standard to store one in the left operand of an assignment when the right operand is the other.
IEEE-754 includes many non-numeric entities called NaNs (for Not a Number), and a C++ implementation might store a NaN different from the right operand. This could include either replacing any NaN with a “canonical” NaN for the implementation or, upon assignment of a signaling Nan, indicating the signal in some way and then converting the signaling NaN to a quiet NaN and storing that.
Non-IEEE-754 formats may have similar issues.

Regarding the latter question, can a == b be false after a = b, where both a and b have type double, the answer is no. The C++ standard does require that an assignment replace the value of the left operand with the value of the right operand. So, after a = b, a must have the value of b, and therefore they are equal.

Note that the C++ standard does not impose any restrictions on the accuracy of floating-point operations (although I see this only stated in non-normative notes). So, theoretically, one might interpret assignment or comparison of floating-point values to be floating-point operations and say that they do not need to be accuracy, so the assignment could change the value or the comparison could return an inaccurate result. I do not believe this is a reasonable interpretation of the standard; the lack of restrictions on floating-point accuracy is intended to allow latitude in expression evaluation and library routines, not simple assignment or comparison.

One should note the above applies specifically to a double object that is assigned from a simple double operand. This should not lull readers into complacency. Several similar but different situations can result in failure of what might seem intuitive mathematically, such as:

After float x = 3.4;, the expression x == 3.4 will generally evaluate as false, since 3.4 is a double and has to be converted to a float for the assignment. That conversion reduces precision and alters the value.
After double x = 3.4 + 1.2;, the expression x == 3.4 + 1.2 is permitted by the C++ standard to evaluate to false. This is because the standard permits floating-point expressions to be evaluated with more precision than the nominal type requires. Thus, 3.4 + 1.2 might be evaluated with the precision of long double. When the result is assigned to x, the standard requires that the excess precision be “discarded,” so the value is converted to a double. As with the float example above, this conversion may change the value. Then the comparison x == 3.4 + 1.2 may compare a double value in x to what is essentially a long double value produced by 3.4 + 1.2.

is assigning two doubles guaranteed to yield the same bitset patterns?

Tags:

c++

floating-point

language-lawyer

Stack Danny

2 Answers

Max Langhof

Eric Postpischil

Recent Activity

Donate For Us

is assigning two doubles guaranteed to yield the same bitset patterns?

Tags:

c++

floating-point

language-lawyer

Stack Danny

2 Answers

Max Langhof

Eric Postpischil

Related questions

Recent Activity

Donate For Us