Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delphi Tokyo 64-bit flushes denormal numbers to zero?

Tags:

During a short look at the source code of system.math, I discovered that the 64-bit version Delphi Tokyo 10.2.3 flushes denormal IEEE-Doubles to zero, as can be seen from then following program;

{$apptype console}
uses
  system.sysutils, system.math;
var
  x: double;
const
  twopm1030 : UInt64 = $0000100000000000; {2^(-1030)}
begin
  x := PDouble(@twopm1030)^;
  writeln(x);
  x := ldexp(1,-515);
  writeln(x*x);
  x := ldexp(1,-1030);
  writeln(x);
end.

For 32-bit the output is as expected

8.69169475979376E-0311
8.69169475979376E-0311
8.69169475979376E-0311

but with 64-bit I get

 8.69169475979375E-0311
 0.00000000000000E+0000
 0.00000000000000E+0000

So basically Tokyo can handle denormal numbers in 64-bit mode, the constant is written correctly, but from arithmetic operations or even with ldexp a denormal result is flushed to zero.

Can this observation be confirmed on other systems? If yes, where it is documented? (The only info I could find about zero-flushing is, that Denormals become zero when stored in a Real48).

Update: I know that for both 32- and 64-bit the single overload is used. For 32-bit the x87 FPU is used and the ASM code is virtually identical for all precisions (single, double, extended). The FPU always returns a 80-bit extended which is stored in a double without premature truncation. The 64-bit code does precision adjustment before storing. Meanwhile I filed an issue report (https://quality.embarcadero.com/browse/RSP-20925), with the focus on the inconsistent results for 32- or 64-bit.

like image 780
gammatester Avatar asked Jul 20 '18 15:07

gammatester


2 Answers

Update:

There is only a difference in how the compiler treats the overloaded selection.

As @Graymatter found out, the LdExp overload called is the Single type for both the 32-bit and the 64-bit compiler. The only difference is the codebase, where the 32-bit compiler is using asm code, while the 64-bit compiler has a purepascal implementation.

To fix the code to use the correct overload, explicitly define the type for the LdExp() first argument like this it works (64-bit):

program Project116;

{$APPTYPE CONSOLE}
uses
  system.sysutils, system.math;
var
  x: double;
const
  twopm1030 : UInt64 = $0000100000000000; {2^(-1030)}
begin
  x := PDouble(@twopm1030)^;
  writeln(x);
  x := ldexp(Double(1),-515);
  writeln(x*x);
  x := ldexp(Double(1),-1030);
  writeln(x);
  ReadLn;
end. 

Outputs:

 8.69169475979375E-0311
 8.69169475979375E-0311
 8.69169475979375E-0311

I would say that this behaviour should be reported as a RTL bug, since the overloaded function selected in your case is the Single type. The resulting type is a Double and the compiler should definitely adapt accordingly. since the 32-bit and the 64-bit compiler should produce the same result.


Note, the Double(1) typecast for floating point types, was introduced in Delphi 10.2 Tokyo. For solutions in prevoius versions, see What is first version of Delphi which allows typecasts like double(10).

like image 92
LU RD Avatar answered Sep 20 '22 15:09

LU RD


The problem here is that Ldexp(single) is returning different results depending on whether the ASM code is being called or whether the pascal code is called. In both cases, the compiler is calling the Single version of the overload because the type isn't specified in the call.

Your pascal code which is executed in the Win64 scenario tries to deal with the exponent less than -126 but the method is still not able to correctly calculate the result because single numbers are limited to an 8 bit exponent. The assembler seems to get around this but I didn't look into it in much detail as to why that's the case.

function Ldexp(const X: Single; const P: Integer): Single;
  { Result := X * (2^P) }
{$IFNDEF X86ASM}
var
  T: Single;
  I: Integer;
const
  MaxExp =  127;
  MinExp = -126;
  FractionOfOne = $00800000;
begin
  T := X;
  Result := X;
  case T.SpecialType of
    fsDenormal,
    fsNDenormal,
    fsPositive,
    fsNegative:
      begin
        FClearExcept;
        I := P;
        if I > MaxExp then 
        begin
          T.BuildUp(False, FractionOfOne, MaxExp);
          Result := Result * T;
          I := I - MaxExp;
          if I > MaxExp then I := MaxExp;
        end
        else if I < MinExp then
        begin
          T.BuildUp(False, FractionOfOne, MinExp);
          Result := Result * T;
          I := I - MinExp;
          if I < MinExp then I := MinExp;
        end;
        if I <> 0 then
        begin
          T.BuildUp(False, FractionOfOne, I);
          Result := Result * T;
        end;
        FCheckExcept;
      end;
//    fsZero,
//    fsNZero,
//    fsInf,
//    fsNInf,
//    fsNaN:
  else
    ;
  end;
end;
{$ELSE X86ASM}
{$IF     defined(CPUX86) and defined(IOS)} // iOS/Simulator
...
{$ELSE} 
asm // StackAlignSafe
        PUSH    EAX
        FILD    dword ptr [ESP]
        FLD     X
        FSCALE
        POP     EAX
        FSTP    ST(1)
        FWAIT
end;
{$ENDIF}
{$ENDIF X86ASM}

As LU RD suggested, you can get around the problem by forcing the methods to call the Double overload. There is a bug but that bug is that the ASM code doesn't match the pascal code in Ldexp(const X: Single; const P: Integer), not that a different overload is being called.

like image 20
Graymatter Avatar answered Sep 21 '22 15:09

Graymatter