During a short look at the source code of system.math, I discovered that the 64-bit version Delphi Tokyo 10.2.3 flushes denormal IEEE-Doubles to zero, as can be seen from then following program;
{$apptype console}
uses
system.sysutils, system.math;
var
x: double;
const
twopm1030 : UInt64 = $0000100000000000; {2^(-1030)}
begin
x := PDouble(@twopm1030)^;
writeln(x);
x := ldexp(1,-515);
writeln(x*x);
x := ldexp(1,-1030);
writeln(x);
end.
For 32-bit the output is as expected
8.69169475979376E-0311
8.69169475979376E-0311
8.69169475979376E-0311
but with 64-bit I get
8.69169475979375E-0311
0.00000000000000E+0000
0.00000000000000E+0000
So basically Tokyo can handle denormal numbers in 64-bit mode, the constant is written correctly, but from arithmetic operations or even with ldexp a denormal result is flushed to zero.
Can this observation be confirmed on other systems? If yes, where it is documented? (The only info I could find about zero-flushing is,
that Denormals become zero when stored in a Real48
).
Update: I know that for both 32- and 64-bit the single overload is used. For 32-bit the x87 FPU is used and the ASM code is virtually identical for all precisions (single, double, extended). The FPU always returns a 80-bit extended which is stored in a double without premature truncation. The 64-bit code does precision adjustment before storing. Meanwhile I filed an issue report (https://quality.embarcadero.com/browse/RSP-20925), with the focus on the inconsistent results for 32- or 64-bit.
Update:
There is only a difference in how the compiler treats the overloaded selection.
As @Graymatter found out, the LdExp
overload called is the Single
type for both the 32-bit and the 64-bit compiler. The only difference is the codebase, where the 32-bit compiler is using asm code, while the 64-bit compiler has a purepascal implementation.
To fix the code to use the correct overload, explicitly define the type for the LdExp()
first argument like this it works (64-bit):
program Project116;
{$APPTYPE CONSOLE}
uses
system.sysutils, system.math;
var
x: double;
const
twopm1030 : UInt64 = $0000100000000000; {2^(-1030)}
begin
x := PDouble(@twopm1030)^;
writeln(x);
x := ldexp(Double(1),-515);
writeln(x*x);
x := ldexp(Double(1),-1030);
writeln(x);
ReadLn;
end.
Outputs:
8.69169475979375E-0311
8.69169475979375E-0311
8.69169475979375E-0311
I would say that this behaviour should be reported as a RTL bug, since the overloaded function selected in your case is the
since the 32-bit and the 64-bit compiler should produce the same result.Single
type. The resulting type is a Double
and the compiler should definitely adapt accordingly.
Note, the Double(1)
typecast for floating point types, was introduced in Delphi 10.2 Tokyo. For solutions in prevoius versions, see What is first version of Delphi which allows typecasts like double(10).
The problem here is that Ldexp(single)
is returning different results depending on whether the ASM code is being called or whether the pascal code is called. In both cases, the compiler is calling the Single version of the overload because the type isn't specified in the call.
Your pascal code which is executed in the Win64 scenario tries to deal with the exponent less than -126 but the method is still not able to correctly calculate the result because single numbers are limited to an 8 bit exponent. The assembler seems to get around this but I didn't look into it in much detail as to why that's the case.
function Ldexp(const X: Single; const P: Integer): Single;
{ Result := X * (2^P) }
{$IFNDEF X86ASM}
var
T: Single;
I: Integer;
const
MaxExp = 127;
MinExp = -126;
FractionOfOne = $00800000;
begin
T := X;
Result := X;
case T.SpecialType of
fsDenormal,
fsNDenormal,
fsPositive,
fsNegative:
begin
FClearExcept;
I := P;
if I > MaxExp then
begin
T.BuildUp(False, FractionOfOne, MaxExp);
Result := Result * T;
I := I - MaxExp;
if I > MaxExp then I := MaxExp;
end
else if I < MinExp then
begin
T.BuildUp(False, FractionOfOne, MinExp);
Result := Result * T;
I := I - MinExp;
if I < MinExp then I := MinExp;
end;
if I <> 0 then
begin
T.BuildUp(False, FractionOfOne, I);
Result := Result * T;
end;
FCheckExcept;
end;
// fsZero,
// fsNZero,
// fsInf,
// fsNInf,
// fsNaN:
else
;
end;
end;
{$ELSE X86ASM}
{$IF defined(CPUX86) and defined(IOS)} // iOS/Simulator
...
{$ELSE}
asm // StackAlignSafe
PUSH EAX
FILD dword ptr [ESP]
FLD X
FSCALE
POP EAX
FSTP ST(1)
FWAIT
end;
{$ENDIF}
{$ENDIF X86ASM}
As LU RD suggested, you can get around the problem by forcing the methods to call the Double overload. There is a bug but that bug is that the ASM code doesn't match the pascal code in Ldexp(const X: Single; const P: Integer)
, not that a different overload is being called.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With