This is mainly a followup to this other question, that was about a weird conversion from long to double and back again to long for big values.
I already know that converting a float to an integral type does truncate, if that is the truncated value cannot be represented in target type, the behaviour is undefined:
4.9 Floating-integral conversions [conv.fpint]
A prvalue of a floating point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type.
But here is my code to demonstrate the problem, assuming a little endian architecture, where both long long and long double use 64 bits:
#include <iostream>
#include <iomanip>
using namespace std;
int main()
{
unsigned long long ull = 0xf000000000000000;
long double d = static_cast<long double>(ull);
// dump the IEE-754 number for a little endian system
unsigned char * pt = reinterpret_cast<unsigned char *>(&d);
for (int i = sizeof(d) -1; i>= 0; i--) {
cout << hex << setw(2) << setfill('0') << static_cast<unsigned int>(pt[i]);
}
cout << endl;
unsigned long long ull2 = static_cast<unsigned long long>(d);
cout << ull << endl << d << endl << ull2 << endl;
return 0;
}
The output is (using MSVC 2008 32bits on a old XP 32 box):
43ee000000000000
f000000000000000
1.72938e+019
8000000000000000
Explainations for values:
1
followed with 0
- exponent is 43e after removing the 3ff bias it gives a binary representation of 1.111 263 so the exact representation of 0xf000000000000000 or 17293822569102704640 (ref)As that value can be represented as an unsigned long long, I expected that its conversion to an unsigned long long gives original value, and MSVC gives 0x8000000000000000 or 9223372036854775808
The question is: is that conversion caused by undefined behaviour as suggested by the accepted answer to the other question or is it really a MSVC bug?
(Note: same code on CLang compiler on a FreeBSD 10.1 box gives correct results)
For references, I could find the generated code:
unsigned long long ull2 = static_cast<unsigned long long>(d);
0041159E fld qword ptr [d]
004115A1 call @ILT+490(__ftol2) (4111EFh)
004115A6 mov dword ptr [ull2],eax
004115A9 mov dword ptr [ebp-40h],edx
And the code for _ftol2 seems to be (got from debugger at execution time):
00411C66 push ebp
00411C67 mov ebp,esp
00411C69 sub esp,20h
00411C6C and esp,0FFFFFFF0h
00411C6F fld st(0)
00411C71 fst dword ptr [esp+18h]
00411C75 fistp qword ptr [esp+10h]
00411C79 fild qword ptr [esp+10h]
00411C7D mov edx,dword ptr [esp+18h]
00411C81 mov eax,dword ptr [esp+10h]
00411C85 test eax,eax
00411C87 je integer_QnaN_or_zero (411CC5h)
00411C89 fsubp st(1),st
00411C8B test edx,edx
00411C8D jns positive (411CADh)
00411C8F fstp dword ptr [esp]
00411C92 mov ecx,dword ptr [esp]
00411C95 xor ecx,80000000h
00411C9B add ecx,7FFFFFFFh
00411CA1 adc eax,0
00411CA4 mov edx,dword ptr [esp+14h]
00411CA8 adc edx,0
00411CAB jmp localexit (411CD9h)
00411CAD fstp dword ptr [esp]
00411CB0 mov ecx,dword ptr [esp]
00411CB3 add ecx,7FFFFFFFh
00411CB9 sbb eax,0
00411CBC mov edx,dword ptr [esp+14h]
00411CC0 sbb edx,0
00411CC3 jmp localexit (411CD9h)
00411CC5 mov edx,dword ptr [esp+14h]
00411CC9 test edx,7FFFFFFFh
00411CCF jne arg_is_not_integer_QnaN (411C89h)
00411CD1 fstp dword ptr [esp+18h]
00411CD5 fstp dword ptr [esp+18h]
00411CD9 leave
00411CDA ret
Methods we used to convert double to long/ long to double: 1 Simple type conversion 2 round () 3 longValue () 4 parseLong () and String methods, etc More ...
Double rounding is often harmless, giving the same result as rounding once, directly from n 0 digits to n 2 digits. However, sometimes a doubly rounded result will be incorrect, in which case we say that a double rounding error has occurred.
With the implicit auto-boxing, a Double object is created with Double d=12345.34 The parseLong () methods can convert a String to a long value. So we should convert the double value to String first. As parseLong () needs a String does contain only digits we should truncate the available string up to dot (decimal point).
Floating-point literals are subject to double rounding when assigned to single-precision variables, resulting in incorrectly rounded decimal to floating-point conversions. If you’re using the gcc C compiler, you can avoid this by attaching the ‘f’ suffix to your literals.
This is mainly a compilation of comments to question.
It appears that old MSVC versions used to incorrectly process conversions of 64 bits integers to 64 bits double precision number.
The bug in present in versions below 2008.
MSCV 2010 is wrong using 32 bits mode and correct in 64 bits mode
All versions starting with 2012 are correct.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With