Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implicit conversion from long long to float yields unexpected result

Tags:

c++

In an attempt to verify (using VS2012) a book's claim (2nd sentence) that

When we assign an integral value to an object of floating-point type, the fractional part is zero. 
Precision may be lost if the integer has more bits than the floating-point object can accommodate.

I wrote the following wee prog:

#include <iostream>
#include <iomanip>

using std::cout;
using std::setprecision;

int main()
{
    long long i = 4611686018427387905; // 2^62 + 2^0

    float f = i; 

    std::streamsize prec = cout.precision();

    cout << i << " " << setprecision(20) << f << setprecision(prec) << std::endl;

    return 0;
}

The output is

4611686018427387905 4611686018427387900

I expected output of the form

4611686018427387905 4611690000000000000

How is a 4-byte float able to retain so much info about an 8-byte integer? Is there a value for i that actually demonstrates the claim?

like image 857
Cohomologous Avatar asked Jan 04 '17 02:01

Cohomologous


2 Answers

Floats don't store their data in base 10, they store it in base 2. Thus, 4611690000000000000 isn't actually a very round number. It's binary representation is:

100000000000000000000111001111100001000001110001010000000000000. 

As you can see, that would take a lot of data to precisely record. The number that's actually printed, however, has the following binary representation:

11111111111111111111111111111111111111111111111111111111111100

As you can see, that's a much rounder number, and the fact that it's off by 4 from a power of two is likely due to rounding in the convert-to-base-10 algorithm.

As an example of a number that won't fit in a float properly, try the number you expected:

4611690000000000000

You'll notice that that will come out very differently.

like image 109
IanPudney Avatar answered Oct 30 '22 22:10

IanPudney


The float retains so much information because you're working with a number that is so close to a power of 2.

The float format stores numbers in basically binary scientific notation. In your case, it gets stored as something like

1.0000000...[61 zeroes]...00000001 * 2^62.

The float format can't store 62 decimal places, so the final 1 gets cut off... but we're left with 2^62, which is almost exactly equal to the number you're trying to store.

I'm bad at manufacturing examples, but CERT isn't; you can view an example of what happens with bungled number conversions here. Note that the example is in Java, but C++ uses the same floating point types; additionally, the first example is a conversion between a 4-byte int and a 4-byte float, but this further proves your point (there's less integer information that needs to be stored than there is in your example, yet it still fails).

like image 41
ameed Avatar answered Oct 30 '22 23:10

ameed