Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does returning a floating-point value change its value?

The following code raises the assert on Red Hat 5.4 32 bits but works on Red Hat 5.4 64 bits (or CentOS).

On 32 bits, I must put the return value of millis2seconds in a variable, otherwise the assert is raised, showing that the value of the double returned from the function is different from the one that was passed to it.

If you comment the "#define BUG" line, it works.

Thanks to @R, passing the -msse2 -mfpmath options to the compiler make both variants of the millis2seconds function work.

/*
 * TestDouble.cpp
 */

#include <assert.h>
#include <stdint.h>
#include <stdio.h>

static double millis2seconds(int millis) {
#define BUG
#ifdef BUG
    // following is not working on 32 bits architectures for any values of millis
    // on 64 bits architecture, it works
    return (double)(millis) / 1000.0;
#else
    //  on 32 bits architectures, we must do the operation in 2 steps ?!? ...
    // 1- compute a result in a local variable, and 2- return the local variable
    // why? somebody can explains?
    double result = (double)(millis) / 1000.0;
    return result;
#endif
}

static void testMillis2seconds() {
    int millis = 10;
    double seconds = millis2seconds(millis);

    printf("millis                  : %d\n", millis);
    printf("seconds                 : %f\n", seconds);
    printf("millis2seconds(millis)  : %f\n", millis2seconds(millis));
    printf("seconds <  millis2seconds(millis)  : %d\n", seconds < millis2seconds(millis));
    printf("seconds >  millis2seconds(millis)  : %d\n", seconds > millis2seconds(millis));
    printf("seconds == millis2seconds(millis)  : %d\n", seconds == millis2seconds(millis));

    assert(seconds == millis2seconds(millis));
}

extern int main(int argc, char **argv) {
    testMillis2seconds();
}
like image 546
armand nissim bendanan Avatar asked Jun 03 '13 00:06

armand nissim bendanan


3 Answers

With the cdecl calling convention, which is used on Linux x86 systems, a double is returned from a function using the st0 x87 register. All x87 registers are 80-bit precision. With this code:

static double millis2seconds(int millis) {     return (double)(millis) / 1000.0; }; 

The compiler calculates the division using 80-bit precision. When gcc is using the GNU dialect of the standard (which it does by default), it leaves the result in the st0 register, so the full precision is returned back to the caller. The end of the assembly code looks like this:

fdivrp  %st, %st(1)  # Divide st0 by st1 and store the result in st0 leave ret                  # Return 

With this code,

static double millis2seconds(int millis) {     double result = (double)(millis) / 1000.0;     return result; } 

the result is stored into a 64-bit memory location, which loses some precision. The 64-bit value is loaded back into the 80-bit st0 register before returning, but the damage is already done:

fdivrp  %st, %st(1)   # Divide st0 by st1 and store the result in st0 fstpl   -8(%ebp)      # Store st0 onto the stack fldl    -8(%ebp)      # Load st0 back from the stack leave ret                   # Return 

In your main, the first result is stored in a 64-bit memory location, so the extra precision is lost either way:

double seconds = millis2seconds(millis); 

but in the second call, the return value is used directly, so the compiler can keep it in a register:

assert(seconds == millis2seconds(millis)); 

When using the first version of millis2seconds, you end up comparing the value that has been truncated to 64-bit precision to the value with full 80-bit precision, so there is a difference.

On x86-64, calculations are done using SSE registers, which are only 64-bit, so this issue doesn't come up.

Also, if you use -std=c99 so that you don't get the GNU dialect, the calculated values are stored in memory and re-loaded into the register before returning so as to be standard-conforming.

like image 109
Vaughn Cato Avatar answered Oct 03 '22 07:10

Vaughn Cato


On i386 (32-bit x86), all floating point expressions are evaluated as an 80-bit IEEE-extended floating point type. This is reflected in FLT_EVAL_METHOD, from float.h, being defined as 2. Storing the result to a variable or applying a cast to the result drops the excess precision via rounding, but that's still not sufficient to guarantee the same result you would see on an implementation (like x86_64) without excess precision, since rounding twice can give different results than performing a computation and rounding in the same step.

One way around this problem is to build using SSE math even on x86 targets, with -msse2 -mfpmath=sse.

like image 41
R.. GitHub STOP HELPING ICE Avatar answered Oct 03 '22 06:10

R.. GitHub STOP HELPING ICE


It's worth noting first of all that since the function is implicitly pure and called twice with a constant argument the compiler would be within its rights to elide the computation and the comparison altogether.

clang-3.0-6ubuntu3 does eliminate the pure function call with -O9, and does all the floating-point calculations at compile time, so the program succeeds.

The C99 standard, ISO/IEC 9899, says

The values of floating operands and the results of floating expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby.

So the compiler is free to pass back an 80-bit value, as others have described. However, the standard goes on to say:

The cast and assignment operators are still required to perform their specified conversions.

This explains why specifically assigning to a double forces the value down to 64-bits and returning as double from a function does not. That is quite surprising it to me.

However, it looks like the C11 standard will actually make this less confusing by adding this text:

If the return expression is evaluated in a floating-point format different from the return type, the expression is converted as if by assignment [which removes any extra range and precision] to the return type of the function and the resulting value is returned to the caller.

So this code is basically exercising unspecified behavior as to whether the value does get truncated or not at various points.


For me, on Ubuntu Precise, with -m32:

  • clang passes
  • clang -O9 also passes
  • gcc, assertion fails
  • gcc -O9 passes, because it also is eliminating the constant expressions
  • gcc -std=c99 fails
  • gcc -std=c1x also fails (but it may work on a later gcc)
  • gcc -ffloat-store passes, but seems to have the side-effect of constant elimination

I don't think this is a gcc bug because the standard allows this behavior but the clang behavior is nicer.

like image 43
poolie Avatar answered Oct 03 '22 07:10

poolie