A colleague of mine picked up a discrepancy between Win32 and Win64 code compiled by Delphi in how it handles NaN's. Take the following code as an example. When compiled in 32 bit we get no messages but when compiled with 64 bit we get both comparisons returning true.
program TestNaNs;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.SysUtils,
System.Math;
var
nanDouble: Double;
zereDouble: Double;
nanSingle: Single;
zeroSingle: Single;
begin
SetExceptionMask(exAllArithmeticExceptions);
nanSingle := NaN;
zeroSingle := 0.0;
if nanSingle <> zeroSingle then
WriteLn('nanSingle <> zeroSingle');
nanDouble := NaN;
zereDouble := 0.0;
if nanDouble <> zereDouble then
WriteLn('nanDouble <> zeroDouble');
ReadLn;
end.
My understanding of the IEEE standard is that <> should return true but all other operations should return false. So in this case, it looks like the 64 bit version is correct and the 32 bit version is incorrect. The code generated by both is very different with the 64 bit version generating SSE code.
For 32 bit:
TestNaNs.dpr.21: if nanSingle <> zeroSingle then
0041A552 D905E01E4200 fld dword ptr [$00421ee0]
0041A558 D81DE41E4200 fcomp dword ptr [$00421ee4]
0041A55E 9B wait
0041A55F DFE0 fstsw ax
0041A561 9E sahf
0041A562 7419 jz $0041a57d
and for 64 bit:
TestNaNs.dpr.21: if nanSingle <> zeroSingle then
000000000042764E F3480F5A05C9ED0000 cvtss2sd xmm0,qword ptr [rel $0000edc9]
0000000000427657 F3480F5A0DC4ED0000 cvtss2sd xmm1,qword ptr [rel $0000edc4]
0000000000427660 660F2EC1 ucomisd xmm0,xmm1
0000000000427664 7A02 jp Project63 + $68
0000000000427666 7420 jz Project63 + $88
My question is this. Is this an issue with the Delphi compiler or a caveat with the Intel CPU's?
The IEEE 754 standard defines arithmetic formats, operations, rounding rules, exceptions etc. for floating point computation. The Delphi compiler implements floating point arithmetic on top of the available hardware units. For the 32 bit Windows compiler this is the x87 unit, and for the 64 bit Windows compiler this is the SSE unit. Both of these hardware units conform to the IEEE 754 standard.
The difference that you are observing arises at the language implementation level. Let us look at the two versions in more detail.
The comparison statement is compiled to this:
TestNaNs.dpr.19: if nanDouble <> zeroDouble then 0041C4C8 DD05C03E4200 fld qword ptr [$00423ec0] 0041C4CE DC1DC83E4200 fcomp qword ptr [$00423ec8] 0041C4D4 9B wait 0041C4D5 DFE0 fstsw ax 0041C4D7 9E sahf 0041C4D8 7419 jz $0041c4f3
The Intel software developers manual says that an unordered comparison is indicated by the flags C3, C2 and C0 being set to 1. The full table is here:
Condition C3 C2 C0 ST(0) > Source 0 0 0 ST(0) < Source 0 0 1 ST(0) = Source 1 0 0 Unordered 1 1 1
When you inspect the FPU under the debugger, you can see that this us the case.
0041C4D5 DFE0 fstsw ax 0041C4D7 9E sahf 0041C4D8 7419 jz $0041c4f3
This transfers various bits from of the FPU status register into the CPU flags, see the manual for precise details of which flags go where. The the branch is made if ZF is set. The value of ZF comes from the C3 FPU flag, which, reading from the table above, is set for the unordered case.
In fact, the entire branching code can be expressed in pseudo code as:
jump if C3 = 1
So, looking at the table above, it is clear that if one of the operands is a NaN then any floating point equality comparison evaluates as equals.
The comparison statement is compiled to this:
TestNaNs.dpr.19: if nanDouble <> zeroDouble then 0000000000428EB8 F20F100548E50000 movsd xmm0,qword ptr [rel $0000e548] 0000000000428EC0 660F2E0548E50000 ucomisd xmm0,qword ptr [rel $0000e548] 0000000000428EC8 7A02 jp TestNaNs + $5C 0000000000428ECA 7420 jz TestNaNs + $7C
The comparison is performed by the ucomisd
instruction. The manual gives this psuedo code:
RESULT ← UnorderedCompare(SRC1[63:0] <> SRC2[63:0]) { (* Set EFLAGS *) CASE (RESULT) OF GREATER_THAN: ZF, PF, CF ← 000; LESS_THAN: ZF, PF, CF ← 001; EQUAL: ZF, PF, CF ← 100; UNORDERED: ZF, PF, CF ← 111; ESAC; OF, AF, SF ← 0;
Notice that in this instruction, the ZF, PF and CF flags are exactly analagous to the C3, C2 and C0 flags on the x87 unit.
The branching is handled by this code:
0000000000428EC8 7A02 jp TestNaNs + $5C 0000000000428ECA 7420 jz TestNaNs + $7C
Notice that there is first a test of the parity flag PF (the jp
instruction), and then the zero flag ZF (the jz
instruction). The compiler has therefore emitted code to handle the unordered case (i.e. one of the operands is NaN). This is handled first with the jp
. Once that is handled, the compiler then checks the zero flag ZF which (because NaNs have been dealt with) is set if and only if the two operands are equal.
The different behaviour is down to the different compilers taking different choices in how to implement the comparison operators. In both situations the hardware is IEEE 754 compliant, and perfectly capable of comparing NaNs as specified by the standard.
My best guess would be that the decisions for the 32 bit compiler were taken a very long time ago. Some of these decisions are questionable. In my view an equality comparison with a NaN operand should evaluate not equals irrespective of the other operand. The weight of history, felt through a desire to maintain backwards compatibility, means that these questionable decisions have never been addressed.
When the 64 bit compiler was created, more recently, the Embarcadero engineers decided to right some of these mistakes. They presumably felt that the break to a new architecture allowed them the freedom to do so.
In an ideal world, the 32 bit compiler could be configured to behave the same way as the 64 bit compiler, by setting a compiler switch.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With