Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Incorrect double to long conversion

Tags:

c++

visual-c++

This is mainly a followup to this other question, that was about a weird conversion from long to double and back again to long for big values.

I already know that converting a float to an integral type does truncate, if that is the truncated value cannot be represented in target type, the behaviour is undefined:

4.9 Floating-integral conversions [conv.fpint]

A prvalue of a floating point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type.

But here is my code to demonstrate the problem, assuming a little endian architecture, where both long long and long double use 64 bits:

#include <iostream>
#include <iomanip>

using namespace std;

int main()
{
  unsigned long long ull = 0xf000000000000000;
  long double d = static_cast<long double>(ull);
  // dump the IEE-754 number for a little endian system
  unsigned char * pt = reinterpret_cast<unsigned char *>(&d);
  for (int i = sizeof(d) -1; i>= 0; i--) {
      cout << hex << setw(2) << setfill('0') << static_cast<unsigned int>(pt[i]); 
  }
  cout << endl;
  unsigned long long ull2 = static_cast<unsigned long long>(d);
  cout << ull << endl << d << endl << ull2 << endl;
  return 0;
}

The output is (using MSVC 2008 32bits on a old XP 32 box):

43ee000000000000
f000000000000000
1.72938e+019
8000000000000000

Explainations for values:

  • 0xf000000000000000 is 17293822569102704640 in decimal, so the conversion to double is correct.
  • 43ee000000000000 : mantissa part is e000000000000 adding the implied 1 it correctly represents 4 bits with 1 followed with 0 - exponent is 43e after removing the 3ff bias it gives a binary representation of 1.111 263 so the exact representation of 0xf000000000000000 or 17293822569102704640 (ref)

As that value can be represented as an unsigned long long, I expected that its conversion to an unsigned long long gives original value, and MSVC gives 0x8000000000000000 or 9223372036854775808

The question is: is that conversion caused by undefined behaviour as suggested by the accepted answer to the other question or is it really a MSVC bug?

(Note: same code on CLang compiler on a FreeBSD 10.1 box gives correct results)

For references, I could find the generated code:

  unsigned long long ull2 = static_cast<unsigned long long>(d);
0041159E  fld         qword ptr [d] 
004115A1  call        @ILT+490(__ftol2) (4111EFh) 
004115A6  mov         dword ptr [ull2],eax 
004115A9  mov         dword ptr [ebp-40h],edx 

And the code for _ftol2 seems to be (got from debugger at execution time):

00411C66  push        ebp  
00411C67  mov         ebp,esp 
00411C69  sub         esp,20h 
00411C6C  and         esp,0FFFFFFF0h 
00411C6F  fld         st(0) 
00411C71  fst         dword ptr [esp+18h] 
00411C75  fistp       qword ptr [esp+10h] 
00411C79  fild        qword ptr [esp+10h] 
00411C7D  mov         edx,dword ptr [esp+18h] 
00411C81  mov         eax,dword ptr [esp+10h] 
00411C85  test        eax,eax 
00411C87  je          integer_QnaN_or_zero (411CC5h) 
00411C89  fsubp       st(1),st 
00411C8B  test        edx,edx 
00411C8D  jns         positive (411CADh) 
00411C8F  fstp        dword ptr [esp] 
00411C92  mov         ecx,dword ptr [esp] 
00411C95  xor         ecx,80000000h 
00411C9B  add         ecx,7FFFFFFFh 
00411CA1  adc         eax,0 
00411CA4  mov         edx,dword ptr [esp+14h] 
00411CA8  adc         edx,0 
00411CAB  jmp         localexit (411CD9h) 
00411CAD  fstp        dword ptr [esp] 
00411CB0  mov         ecx,dword ptr [esp] 
00411CB3  add         ecx,7FFFFFFFh 
00411CB9  sbb         eax,0 
00411CBC  mov         edx,dword ptr [esp+14h] 
00411CC0  sbb         edx,0 
00411CC3  jmp         localexit (411CD9h) 
00411CC5  mov         edx,dword ptr [esp+14h] 
00411CC9  test        edx,7FFFFFFFh 
00411CCF  jne         arg_is_not_integer_QnaN (411C89h) 
00411CD1  fstp        dword ptr [esp+18h] 
00411CD5  fstp        dword ptr [esp+18h] 
00411CD9  leave            
00411CDA  ret 
like image 920
Serge Ballesta Avatar asked Nov 20 '15 14:11

Serge Ballesta


People also ask

How to convert double to Long/Long to double?

Methods we used to convert double to long/ long to double: 1 Simple type conversion 2 round () 3 longValue () 4 parseLong () and String methods, etc More ...

What is a double rounding error?

Double rounding is often harmless, giving the same result as rounding once, directly from n 0 digits to n 2 digits. However, sometimes a doubly rounded result will be incorrect, in which case we say that a double rounding error has occurred.

How to convert a double to a string in Java?

With the implicit auto-boxing, a Double object is created with Double d=12345.34 The parseLong () methods can convert a String to a long value. So we should convert the double value to String first. As parseLong () needs a String does contain only digits we should truncate the available string up to dot (decimal point).

How do I Avoid Double rounding when converting decimal to floating point?

Floating-point literals are subject to double rounding when assigned to single-precision variables, resulting in incorrectly rounded decimal to floating-point conversions. If you’re using the gcc C compiler, you can avoid this by attaching the ‘f’ suffix to your literals.


1 Answers

This is mainly a compilation of comments to question.

It appears that old MSVC versions used to incorrectly process conversions of 64 bits integers to 64 bits double precision number.

The bug in present in versions below 2008.

MSCV 2010 is wrong using 32 bits mode and correct in 64 bits mode

All versions starting with 2012 are correct.

like image 109
Serge Ballesta Avatar answered Oct 23 '22 20:10

Serge Ballesta