Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

conversion of double to string to double throws exception

The following code throws an std::out_of_range exception in Visual Studio 2013 where in my opinion it shouldn't:

#include <string>
#include <limits>

int main(int argc, char ** argv)
{
    double maxDbl = std::stod(std::to_string(std::numeric_limits<double>::max()));

    return 0;
}

I tested the code also with gcc 4.9.2 and there it does not throw an exception. The issue seems to be caused by an inaccurate string representation after the conversion to string. In Visual Studio std::to_string(std::numeric_limits<double>::max()) yields

179769313486231610000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000

which indeed seems too large. In gcc, however, it yields

179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000

which seems to be smaller than the passed value.

However, isn't std::numeric_limits<double>::max() supposed to return the

maximum finite representable floating-point number?

So why do the string representations get off? What am I missing here?

like image 361
sigy Avatar asked Jul 27 '15 12:07

sigy


1 Answers

Direct answer

Gcc (and Clang and VS2105) correctly return the integer value of (21024 - 1) - (21024-53 - 1) that is what is represented with 52 one bits of significand and an unbiased exponent of 1023 (21024 - 1 would be the integer value with 1023 one bits, and I just substract all the bits below the 52 of the IEE754 format)

I can confirm that a large integer library give 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368L

The previous exact floating point would be 2971 lesser (971 = 1023 - 52) that is : 179769313486231550856124328384506240234343437157459335924404872448581845754556114388470639943126220321960804027157371570809852884964511743044087662767600909594331927728237078876188760579532563768698654064825262115771015791463983014857704008123419459386245141723703148097529108423358883457665451722744025579520L

The next non representable value would be 2971 greater that is: 179769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137216L

But the value used by MSVC2013 and previous is near to 21024 + 2971, that is : 179769313486231610731333614426100589925524828262616317947942685512308090830973387504827396012048193870699768806228404251083258210739369062217227314575410731769485876273179688476358949112102859294830297395714877595371718127781702814782017661749531126051903195165027873311156314696040132728420308633064323416064L . As it is greater than any value representable in IEEE754 double precision, it cannot be decoded to a double.

Because at most, one could say that any value between 21024 - 2971 (std::numeric_limits<double>::max()) and 21024 could be rounded to std::numeric_limits<double>::max(), but values greater than 21024 are clearly an overflow.


Discussion on accuracy

Only 16 decimal digits are accurate in a double and all other digits can be seen as garbage or random values since they do not depend on the value itself but only one the way you choose to calculate them. Just try to substract 1e+288 (that's already a big value) to maxDbl and look what happens :

maxLess = max Dbl - 1.e+288;
if (maxLess == maxDbl) {
   std::cout << "Unchanged" << std::endl;
}
else std::cout << "Changed" << std::endl;

You should see ... Unchanged.

It just looks like VS 2013 is a little incoherent in the way it rounds floating point values : it rounded maxDbl by excess to one bit higher than the maximum actually representable value, and could not decode it later.

The problem is that the standard choosed to use a %f format which gives a false sentiment of accuracy. If you want to see an equivalent problem in gcc, just use :

#include <iostream>
#include <string>
#include <limits>
#include <iomanip>
#include <sstream>

int main() {
    double max = std::numeric_limits<double>::max();
    std::ostringstream ostr;
    ostr << std::setprecision(16) << max;
    std::string smax = ostr.str();
    std::cout << smax << std::endl;
    double m2 = std::stod(smax);
    std::cout << m2 << std::endl;

    return 0;
}

Rounded to 16 digits mxDbl writes (correctly) : 1.797693134862316e+308, but can no longer be decoded back

And this one :

#include <iostream>
#include <string>
#include <limits>

int main() {
    double maxDbl = std::numeric_limits<double>::max();
    std::string smax = std::to_string(maxDbl);
    std::cout << smax << std::endl;
    
    std::string smax2 = "179769313486231570800000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000";

    double max2 = std::stod(smax2);
    if (max2 == maxDbl) {
       std::cout << smax2 << " is same double as " << smax << std::endl;
    }

    return 0;
}

Displays :

179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000
179769313486231570800000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000 is same double as 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000

TL/DR : What I mean is that one big enoudh double value can of course be represented by an exact integer (per IEEE754). But it does represent all integers between half to the previous one and half to the next one. So any integer in that range could be an acceptable representation for the double, and one value rounded at 16 decimal digits should be acceptable, but current standard libraries only allow max floating point value to be truncated at 16 decimal digits. But VS2013 gave a number above the max of the range what was in any case an error.

Reference

IEEE floating point on wikipedia

like image 70
Serge Ballesta Avatar answered Oct 29 '22 04:10

Serge Ballesta