Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Floating point maxing out loop doesn't terminate in D, works in C++

I have two similar programs one in C++ and another in D.

The compilation is on on Windows7 64bit, to 64bit binaries.

C++ version, VS 2013:

#include <iostream>
#include <string>

int main(int argc, char* argv[])
{
    float eps = 1.0f;
    float f = 0.0f;
    while (f + eps != f)
        f += 1.0f;

    std::cout << "eps = " + std::to_string(eps) + ", max_f = " + std::to_string(f) << std::endl;
    return 0;
}

D version, DMD v2.066.1:

import std.stdio;
import std.conv;

int main(string[] argv)
{
    float eps = 1.0f;
    float f = 0.0f;
    while (f + eps != f)
        f += 1.0f;

    writeln("eps = " ~ to!string(eps) ~ ", max_f = " ~ to!string(f));
    return 0;
}

C++ version works as expected and finds that f + e == f when f = 16777216.

But D version hungs forever. When I put breakpoint I see that in D version f also 16777216 (after running for some time) and Watch window (I use VisualD) shows that (f + e != f) is 'false' so the loop should be terminate but it's not the case during runtime.

I think assembly could give the answer but I'm not very good with it.

I'm new to D, so it should be the case that I misused the language/compiler (compiled with DMD just as 'dmd test.d' without additional options and also from VS with VisualD with default options). Any ideas what could be wrong with D version of the program? Thanks!

Disassembly:

C++:

000000013F7D1410  mov         rax,rsp  
000000013F7D1413  push        rbp  
000000013F7D1414  lea         rbp,[rax-5Fh]  
000000013F7D1418  sub         rsp,0E0h  
000000013F7D141F  mov         qword ptr [rbp+17h],0FFFFFFFFFFFFFFFEh  
000000013F7D1427  mov         qword ptr [rax+8],rbx  
000000013F7D142B  movaps      xmmword ptr [rax-18h],xmm6  
000000013F7D142F  xorps       xmm1,xmm1  
    float eps = 1.0f;
    float f = 0.0f;
000000013F7D1432  movss       xmm6,dword ptr [__real@3f800000 (013F7D67E8h)]  
000000013F7D143A  nop         word ptr [rax+rax]  
        f += 1.0f;
000000013F7D1440  addss       xmm1,xmm6  
    while (f + eps != f)
000000013F7D1444  movaps      xmm0,xmm1  
000000013F7D1447  addss       xmm0,xmm6  
000000013F7D144B  ucomiss     xmm0,xmm1  
000000013F7D144E  jp          main+30h (013F7D1440h)  
000000013F7D1450  jne         main+30h (013F7D1440h)  

D:

000000013F761002  mov         ebp,esp  
000000013F761004  sub         rsp,50h  
{
    float eps = 1.0f;
000000013F761008  xor         eax,eax  
000000013F76100A  mov         dword ptr [rbp-50h],eax  
000000013F76100D  movss       xmm0,dword ptr [rbp-50h]  
000000013F761012  movss       dword ptr [f],xmm0  
    float f = 0.0f;
    while (f + eps != f)
        f += 1.0f;
000000013F761017  movss       xmm1,dword ptr [__NULL_IMPORT_DESCRIPTOR+1138h (013F7C3040h)]  
000000013F76101F  movss       xmm2,dword ptr [f]  
000000013F761024  addss       xmm2,xmm1  
000000013F761028  movss       dword ptr [f],xmm2  
000000013F76102D  fld         dword ptr [f]  
000000013F761030  fadd        dword ptr [__NULL_IMPORT_DESCRIPTOR+1138h (013F7C3040h)]  
000000013F761036  fld         dword ptr [f]  
000000013F761039  fucomip     st,st(1)  
000000013F76103B  fstp        st(0)  
000000013F76103D  jne         D main+17h (013F761017h)  
000000013F76103F  jp          D main+17h (013F761017h)  

Summary

Accept harold's answer that program behavior is due to the mixed FPU and SSE usage.

Here's a summary what happens in D assembly snippet. In fact the loop will run forever.

SSE behaves strictly according to IEEE-754 when f reaches 16777216.0 and we add 1.0 to this value (f += 1.0f) we still obtain 16777216.0 in xmm2 register, then we store it to memory.

(f + eps != f) expression is computed on the FPU. Since FPU registers have enough precision (f+eps) results in 16777217.0. If we stored this result back to memory into float variable then we would get expected value 16777216.0 (since 16777217.0 is not represented as float). And (f + eps != f) would be 'false' and loop would terminate. But we do not store any numbers back to memory and perform comparison on the FPU (since we have both operands). It means that we compare one number that is computed strictly according to IEEE-754 (f) and another that is computed with 80bit accuracy (f+eps). 16777216.0 != 16777217.0 and the loop runs forever.

I'm not an expert in this area but for me it looks like that doing floating point with SSE instructions is more robust as was demonstrated in C++ version of the program.

Update

I had a discussion on the D forum http://forum.dlang.org/thread/[email protected]

It turned out that program behaves correctly - it's according to the language specification that intermediate calculations can be performed with higher accuracy.

The robust implementation for any D compiler is:

import std.stdio;
int main()
{
    const float eps = 1.0f;
    const float step = 1.0;

    float f = 0.0f;
    float fPlusEps = f + eps;
    while (f != fPlusEps)
    {
        f += step;
        fPlusEps = f + eps;
    }
    writeln("eps = ", eps, ", max_f = ", f);
    return 0;
}
like image 673
Artem Kharytoniuk Avatar asked Feb 07 '15 09:02

Artem Kharytoniuk


2 Answers

Mixed FPU and SSE code, that's .. really strange. I see absolutely no reason to implement it this way.

But they have, and the result is that f + eps != f is evaluated with 80bit extended precision, while
f += 1.0f is evaluated using 32bit floats.

That means the loop can never end, since f will stop going up before the value that makes
f + eps != f false (which, in 80bit precision, is huge) is reached.

like image 99
harold Avatar answered Sep 30 '22 15:09

harold


Trying to break a loop with != or == with floating point values is looking for troubles.

The different behavior is mot likely due to the float to double to 80-bits internal floating point conversion compiler may adopt when passing values to the FPU.

When extending the mantissa, in particular- some compilers or optimizer can decide to let the less significant bit "random" instead of zeroed. So 1.0f, when given to the FPU may become 1.000000000000000000000012134432 that -according to a float- precision, is still 1.0, but wen 1.000000000000000000000012134432 and 1.000000000000000000000089544455 (the two tail are random) are compared by the FPU, look different.

You should verify how C++ and D compiler treat the floating point extension/reduction and eventually configure the appropriate switches: if the two compilers are not from the same manufacturer, thay had probably made different choices for their respective defaults.

like image 38
Emilio Garavaglia Avatar answered Sep 30 '22 14:09

Emilio Garavaglia