Floating point differences between 64 bit and 32 bit with Round

Tags:

I know all about the approximation issues with floating point numbers so I understand how 4.5 can get rounded down to 4 if it was approximated as 4.4999999999999991. My question is why is there a difference using the same types with 32 bit and 64 bit.

In the code below I have two calculations. In 32 bit the value for MyRoundValue1 is 4 and the value for MyRoundValue2 is 5. In 64 bit they are both 4. Shouldn't the results be consistent with both 32 bit and 64 bit?

{$APPTYPE CONSOLE}
const
  MYVALUE1: Double = 4.5;
  MYVALUE2: Double = 5;
  MyCalc: Double = 0.9;
var
  MyRoundValue1: Integer;
  MyRoundValue2: Integer;
begin
  MyRoundValue1 := Round(MYVALUE1);
  MyRoundValue2 := Round(MYVALUE2 * MyCalc);
  WriteLn(IntToStr(MyRoundValue1));
  WriteLn(IntToStr(MyRoundValue2));
end.

327

asked Jul 14 '15 19:07

Graymatter

1 Answers

In x87 this code:

MyRoundValue2 := Round(MYVALUE2 * MyCalc);

Is compiled to:

MyRoundValue2 := Round(MYVALUE2 * MyCalc);
0041C4B2 DD0508E64100     fld qword ptr [$0041e608]
0041C4B8 DC0D10E64100     fmul qword ptr [$0041e610]
0041C4BE E8097DFEFF       call @ROUND
0041C4C3 A3C03E4200       mov [$00423ec0],eax

The default control word for the x87 unit under the Delphi RTL performs calculations to 80 bit precision. So the floating point unit multiplies 5 by the closest 64 bit value to 0.9 which is:

0.90000 00000 00000 02220 44604 92503 13080 84726 33361 81640 625

Note that this value is greater than 0.9. And it turns out that when multiplied by 5, and rounded to the nearest 80 bit value, the value is greater than 4.5. Hence Round(MYVALUE2 * MyCalc) returns 5.

On 64 bit, the floating point math is done on the SSE unit. That does not use 80 bit intermediate values. And it turns out that 5 times the closest double to 0.9, rounded to double precision is exactly 4.5. Hence Round(MYVALUE2 * MyCalc) returns 4 on 64 bit.

You can persuade the 32 bit compiler to behave the same way as the 64 bit compiler by storing to a double rather than relying on intermediate 80 bit values:

{$APPTYPE CONSOLE}
const
  MYVALUE1: Double = 4.5;
  MYVALUE2: Double = 5;
  MyCalc: Double = 0.9;
var
  MyRoundValue1: Integer;
  MyRoundValue2: Integer;
  d: Double;
begin
  MyRoundValue1 := Round(MYVALUE1);
  d := MYVALUE2 * MyCalc;
  MyRoundValue2 := Round(d);
  WriteLn(MyRoundValue1);
  WriteLn(MyRoundValue2);
end.

This program produces the same output as your 64 bit program.

Or you can force the x87 unit to use 64 bit intermediates.

{$APPTYPE CONSOLE}
uses
  SysUtils;
const
  MYVALUE1: Double = 4.5;
  MYVALUE2: Double = 5;
  MyCalc: Double = 0.9;
var
  MyRoundValue1: Integer;
  MyRoundValue2: Integer;
begin
  Set8087CW($1232); //  <-- round intermediates to 64 bit
  MyRoundValue1 := Round(MYVALUE1);
  MyRoundValue2 := Round(MYVALUE2 * MyCalc);
  WriteLn(MyRoundValue1);
  WriteLn(MyRoundValue2);
end.

189

answered Oct 27 '22 14:10

David Heffernan

Related questions
                            
                                Managing very large codebases in Delphi using a Library of Debug and Regular DCUs I built myself
                            
                                AsyncPro and 64bit
                            
                                Delphi - Calling Win API
                            
                                DELPHI Where can I add common library path in 10.3 Community version
                            
                                Making an IDE using Pascal Script and SynEdit
                            
                                How to send command to console application from GUI application
                            
                                Wait before ShellExecute is carried out?
                            
                                Creating a REST web service in Delphi 2009 using IIS
                            
                                How to make windows recognize my app as a firewall product?
                            
                                How to get rid of TListBox vertical scroll limit?
                            
                                Delphi - Updating a global string from a second thread
                            
                                Delphi XE2 : How can I play video files with firemonkey
                            
                                What is "*;1" in TADOStoredProc.ProcedureName value in Delphi?
                            
                                Iterating over nodes and its childs and modify data
                            
                                Delphi / Pascal Example for Calling OpenSSL EVP functions
                            
                                Delphi DllMain DLL_PROCESS_DETACH called before DLL_PROCESS_ATTACH
                            
                                Is there a use for THeaderControl in Delphi
                            
                                string := const : why different implementation for local and result?
                            
                                How to find out which port uses a process?
                            
                                How to use TCharHelper?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Floating point differences between 64 bit and 32 bit with Round

Tags:

delphi

delphi-xe7

Graymatter

People also ask

1 Answers

David Heffernan

Recent Activity

Donate For Us