I have two similar programs one in C++ and another in D.
The compilation is on on Windows7 64bit, to 64bit binaries.
C++ version, VS 2013:
#include <iostream>
#include <string>
int main(int argc, char* argv[])
{
float eps = 1.0f;
float f = 0.0f;
while (f + eps != f)
f += 1.0f;
std::cout << "eps = " + std::to_string(eps) + ", max_f = " + std::to_string(f) << std::endl;
return 0;
}
D version, DMD v2.066.1:
import std.stdio;
import std.conv;
int main(string[] argv)
{
float eps = 1.0f;
float f = 0.0f;
while (f + eps != f)
f += 1.0f;
writeln("eps = " ~ to!string(eps) ~ ", max_f = " ~ to!string(f));
return 0;
}
C++ version works as expected and finds that f + e == f when f = 16777216.
But D version hungs forever. When I put breakpoint I see that in D version f also 16777216 (after running for some time) and Watch window (I use VisualD) shows that (f + e != f) is 'false' so the loop should be terminate but it's not the case during runtime.
I think assembly could give the answer but I'm not very good with it.
I'm new to D, so it should be the case that I misused the language/compiler (compiled with DMD just as 'dmd test.d' without additional options and also from VS with VisualD with default options). Any ideas what could be wrong with D version of the program? Thanks!
Disassembly:
C++:
000000013F7D1410 mov rax,rsp
000000013F7D1413 push rbp
000000013F7D1414 lea rbp,[rax-5Fh]
000000013F7D1418 sub rsp,0E0h
000000013F7D141F mov qword ptr [rbp+17h],0FFFFFFFFFFFFFFFEh
000000013F7D1427 mov qword ptr [rax+8],rbx
000000013F7D142B movaps xmmword ptr [rax-18h],xmm6
000000013F7D142F xorps xmm1,xmm1
float eps = 1.0f;
float f = 0.0f;
000000013F7D1432 movss xmm6,dword ptr [__real@3f800000 (013F7D67E8h)]
000000013F7D143A nop word ptr [rax+rax]
f += 1.0f;
000000013F7D1440 addss xmm1,xmm6
while (f + eps != f)
000000013F7D1444 movaps xmm0,xmm1
000000013F7D1447 addss xmm0,xmm6
000000013F7D144B ucomiss xmm0,xmm1
000000013F7D144E jp main+30h (013F7D1440h)
000000013F7D1450 jne main+30h (013F7D1440h)
D:
000000013F761002 mov ebp,esp
000000013F761004 sub rsp,50h
{
float eps = 1.0f;
000000013F761008 xor eax,eax
000000013F76100A mov dword ptr [rbp-50h],eax
000000013F76100D movss xmm0,dword ptr [rbp-50h]
000000013F761012 movss dword ptr [f],xmm0
float f = 0.0f;
while (f + eps != f)
f += 1.0f;
000000013F761017 movss xmm1,dword ptr [__NULL_IMPORT_DESCRIPTOR+1138h (013F7C3040h)]
000000013F76101F movss xmm2,dword ptr [f]
000000013F761024 addss xmm2,xmm1
000000013F761028 movss dword ptr [f],xmm2
000000013F76102D fld dword ptr [f]
000000013F761030 fadd dword ptr [__NULL_IMPORT_DESCRIPTOR+1138h (013F7C3040h)]
000000013F761036 fld dword ptr [f]
000000013F761039 fucomip st,st(1)
000000013F76103B fstp st(0)
000000013F76103D jne D main+17h (013F761017h)
000000013F76103F jp D main+17h (013F761017h)
Accept harold's answer that program behavior is due to the mixed FPU and SSE usage.
Here's a summary what happens in D assembly snippet. In fact the loop will run forever.
SSE behaves strictly according to IEEE-754 when f reaches 16777216.0 and we add 1.0 to this value (f += 1.0f) we still obtain 16777216.0 in xmm2 register, then we store it to memory.
(f + eps != f) expression is computed on the FPU. Since FPU registers have enough precision (f+eps) results in 16777217.0. If we stored this result back to memory into float variable then we would get expected value 16777216.0 (since 16777217.0 is not represented as float). And (f + eps != f) would be 'false' and loop would terminate. But we do not store any numbers back to memory and perform comparison on the FPU (since we have both operands). It means that we compare one number that is computed strictly according to IEEE-754 (f) and another that is computed with 80bit accuracy (f+eps). 16777216.0 != 16777217.0 and the loop runs forever.
I'm not an expert in this area but for me it looks like that doing floating point with SSE instructions is more robust as was demonstrated in C++ version of the program.
I had a discussion on the D forum http://forum.dlang.org/thread/[email protected]
It turned out that program behaves correctly - it's according to the language specification that intermediate calculations can be performed with higher accuracy.
The robust implementation for any D compiler is:
import std.stdio;
int main()
{
const float eps = 1.0f;
const float step = 1.0;
float f = 0.0f;
float fPlusEps = f + eps;
while (f != fPlusEps)
{
f += step;
fPlusEps = f + eps;
}
writeln("eps = ", eps, ", max_f = ", f);
return 0;
}
Mixed FPU and SSE code, that's .. really strange. I see absolutely no reason to implement it this way.
But they have, and the result is that f + eps != f
is evaluated with 80bit extended precision, whilef += 1.0f
is evaluated using 32bit floats.
That means the loop can never end, since f
will stop going up before the value that makesf + eps != f
false (which, in 80bit precision, is huge) is reached.
Trying to break a loop with != or == with floating point values is looking for troubles.
The different behavior is mot likely due to the float to double to 80-bits internal floating point conversion compiler may adopt when passing values to the FPU.
When extending the mantissa, in particular- some compilers or optimizer can decide to let the less significant bit "random" instead of zeroed. So 1.0f
, when given to the FPU may become 1.000000000000000000000012134432
that -according to a float
- precision, is still 1.0
, but wen 1.000000000000000000000012134432
and 1.000000000000000000000089544455
(the two tail are random) are compared by the FPU, look different.
You should verify how C++ and D compiler treat the floating point extension/reduction and eventually configure the appropriate switches: if the two compilers are not from the same manufacturer, thay had probably made different choices for their respective defaults.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With