C++ Bit Representation of a Double

Question

My understanding was that doubles are stored based on the IEEE 754-2008 standard in C++, where the first bit is the sign, the following 11 are the exponents and the remaining 52 are the fraction. However, the code below + output shows something else.

Code

#include <bitset>
#include <iostream>

int main() {
    
    double value_float = 1.5;
    uint64_t value_uint = 0;


    std::memcpy(&value_uint, &value_float, 8);
    std::cout << value_uint << std::endl;

    std::bitset<64>doubleBitset(value_uint);
    std::cout << "DoubleBitset for " << value_float << " is: ";
    std::cout << doubleBitset << std::endl;

}

Output: DoubleBitset for 1.5 is: 0011111111111000000000000000000000000000000000000000000000000000

Why isn't the output 0b0 - 00000000001 1000000000000000000000000000000000000000000000000000;

Lala5th · Accepted Answer

The double stores the exponent biased. This means that essentially it is stored as an unsigned integer and whenever it is used 1023 is subtracted. This will essentially mean that positive numbers will have their first bit set, while 0 will only have that bit unset. A good overview on the standard is the Wiki article.

Back to the question if we calculate the representation manually we get: 1) First bit is 0 since positive. 2) The exponent needs to be 0 so it is set to 0b01111111111 as that represents a 0 exponent. 3) The mantissa will be what you expect, i.e. one bit set at the start then 51 zeros.

Edit: As pointed out in the comments there are some special meaning in the exponent in some cases. If the exponent is 2047 then if all of the mantissa is 0 then it represents Inf or -Inf. If the mantissa is not all 0 if the previous case, then the value is NaN. To represent zero the mantissa must be 0 with exponent 0 as well. I like to think of these as special cases rather than a code, since most of these are conditional on the other parts of the double's representation, but Eric Postpischil's comment is an other way to think about this.

C++ Bit Representation of a Double

Tags:

c++

double

compression

ieee-754

Nate K

1 Answers

Lala5th

Recent Activity

Donate For Us

C++ Bit Representation of a Double

Tags:

c++

double

compression

ieee-754

Nate K

1 Answers

Lala5th

Related questions

Recent Activity

Donate For Us