The Problem Since real numbers cannot be represented accurately in a fixed space, when operating with floating-point numbers, the result might not be able to be fully represented with the required precision.
The floating-point calculations are inaccurate because mainly the rationals are approximating that cannot be represented finitely in base 2 and in general they are approximating numbers which may not be representable in finitely many digits in any base.
Inexact The “real” result of a computation cannot be exactly represented by a floating-point number. The silent response is to round the number, which is a behaviour that the vast majority of programs using floating-point numbers rely upon. However, rounding has to be correctly taking into account for sound analysis.
Floating-point numbers also offer greater precision. Precision measures the number of bits used to represent numbers. Precision can be used to estimate the impact of errors due to integer truncation and rounding. The precision of a floating-point number is determined by the mantissa.
When I run the following code in VC++ 2013 (32-bit, no optimizations):
#include <cmath>
#include <iostream>
#include <limits>
double mulpow10(double const value, int const pow10)
{
static double const table[] =
{
1E+000, 1E+001, 1E+002, 1E+003, 1E+004, 1E+005, 1E+006, 1E+007,
1E+008, 1E+009, 1E+010, 1E+011, 1E+012, 1E+013, 1E+014, 1E+015,
1E+016, 1E+017, 1E+018, 1E+019,
};
return pow10 < 0 ? value / table[-pow10] : value * table[+pow10];
}
int main(void)
{
double d = 9710908999.008999;
int j_max = std::numeric_limits<double>::max_digits10;
while (j_max > 0 && (
static_cast<double>(
static_cast<unsigned long long>(
mulpow10(d, j_max))) != mulpow10(d, j_max)))
{
--j_max;
}
double x = std::floor(d * 1.0E9);
unsigned long long y1 = x;
unsigned long long y2 = std::floor(d * 1.0E9);
std::cout
<< "x == " << x << std::endl
<< "y1 == " << y1 << std::endl
<< "y2 == " << y2 << std::endl;
}
I get
x == 9.7109089990089994e+018
y1 == 9710908999008999424
y2 == 9223372036854775808
in the debugger.
I'm mindblown. Can someone please explain to me how the heck y1
and y2
have different values?
This only seems to happen under /Arch:SSE2
or /Arch:AVX
, not /Arch:IA32
or /Arch:SSE
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With