Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Floating point differences between 64 bit and 32 bit with Round

I know all about the approximation issues with floating point numbers so I understand how 4.5 can get rounded down to 4 if it was approximated as 4.4999999999999991. My question is why is there a difference using the same types with 32 bit and 64 bit.

In the code below I have two calculations. In 32 bit the value for MyRoundValue1 is 4 and the value for MyRoundValue2 is 5. In 64 bit they are both 4. Shouldn't the results be consistent with both 32 bit and 64 bit?

{$APPTYPE CONSOLE}
const
  MYVALUE1: Double = 4.5;
  MYVALUE2: Double = 5;
  MyCalc: Double = 0.9;
var
  MyRoundValue1: Integer;
  MyRoundValue2: Integer;
begin
  MyRoundValue1 := Round(MYVALUE1);
  MyRoundValue2 := Round(MYVALUE2 * MyCalc);
  WriteLn(IntToStr(MyRoundValue1));
  WriteLn(IntToStr(MyRoundValue2));
end.
like image 327
Graymatter Avatar asked Jul 14 '15 19:07

Graymatter


People also ask

What is the difference between a 32-bit and 64-bit floating point value?

Floating Point Numbers Floats generally come in two flavours: “single” and “double” precision. Single precision floats are 32-bits in length while “doubles” are 64-bits. Due to the finite size of floats, they cannot represent all of the real numbers - there are limitations on both their precision and range.

What is floating point 32-bit?

Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.

What is the range of float with 64 bits?

A double precision, floating-point number is a 64-bit approximation of a real number. The number can be zero or can range from -1.797693134862315E+308 to -2.225073858507201E-308, or from 2.225073858507201E-308 to 1.797693134862315E+308.

What is the range of a 32-bit float?

32-bit single precision, with an approximate range of 10 -101 to 10 90 and precision of 7 decimal digits.


1 Answers

In x87 this code:

MyRoundValue2 := Round(MYVALUE2 * MyCalc);

Is compiled to:

MyRoundValue2 := Round(MYVALUE2 * MyCalc);
0041C4B2 DD0508E64100     fld qword ptr [$0041e608]
0041C4B8 DC0D10E64100     fmul qword ptr [$0041e610]
0041C4BE E8097DFEFF       call @ROUND
0041C4C3 A3C03E4200       mov [$00423ec0],eax

The default control word for the x87 unit under the Delphi RTL performs calculations to 80 bit precision. So the floating point unit multiplies 5 by the closest 64 bit value to 0.9 which is:

0.90000 00000 00000 02220 44604 92503 13080 84726 33361 81640 625

Note that this value is greater than 0.9. And it turns out that when multiplied by 5, and rounded to the nearest 80 bit value, the value is greater than 4.5. Hence Round(MYVALUE2 * MyCalc) returns 5.

On 64 bit, the floating point math is done on the SSE unit. That does not use 80 bit intermediate values. And it turns out that 5 times the closest double to 0.9, rounded to double precision is exactly 4.5. Hence Round(MYVALUE2 * MyCalc) returns 4 on 64 bit.

You can persuade the 32 bit compiler to behave the same way as the 64 bit compiler by storing to a double rather than relying on intermediate 80 bit values:

{$APPTYPE CONSOLE}
const
  MYVALUE1: Double = 4.5;
  MYVALUE2: Double = 5;
  MyCalc: Double = 0.9;
var
  MyRoundValue1: Integer;
  MyRoundValue2: Integer;
  d: Double;
begin
  MyRoundValue1 := Round(MYVALUE1);
  d := MYVALUE2 * MyCalc;
  MyRoundValue2 := Round(d);
  WriteLn(MyRoundValue1);
  WriteLn(MyRoundValue2);
end.

This program produces the same output as your 64 bit program.

Or you can force the x87 unit to use 64 bit intermediates.

{$APPTYPE CONSOLE}
uses
  SysUtils;
const
  MYVALUE1: Double = 4.5;
  MYVALUE2: Double = 5;
  MyCalc: Double = 0.9;
var
  MyRoundValue1: Integer;
  MyRoundValue2: Integer;
begin
  Set8087CW($1232); //  <-- round intermediates to 64 bit
  MyRoundValue1 := Round(MYVALUE1);
  MyRoundValue2 := Round(MYVALUE2 * MyCalc);
  WriteLn(MyRoundValue1);
  WriteLn(MyRoundValue2);
end.
like image 189
David Heffernan Avatar answered Oct 27 '22 14:10

David Heffernan