Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Floating point arithmetic and machine epsilon

I'm trying to compute an approximation of the epsilon value for the float type (and I know it's already in the standard library).

The epsilon values on this machine are (printed with some approximation):

 FLT_EPSILON = 1.192093e-07
 DBL_EPSILON = 2.220446e-16
LDBL_EPSILON = 1.084202e-19

FLT_EVAL_METHOD is 2 so everything is done in long double precision, and float, double and long double are 32, 64 and 96 bit.

I tried to get an approximation of the value starting from 1 and dividing it by 2 until it becomes too small, doing all operation with the float type:

# include <stdio.h>

int main(void)
{
    float floatEps = 1;

    while (1 + floatEps / 2 != 1)
        floatEps /= 2;

    printf("float eps = %e\n", floatEps);
}

The output is not what I was looking for:

float epsilon = 1.084202e-19

Intermediate operations are done with the greatest precision (due to the value of FLT_EVAL_METHOD), so this result seems legit.

However, this:

// 2.0 is a double literal
while ((float) (1 + floatEps / 2.0) != 1)
    floatEps /= 2;

gives this output, which is the right one:

float epsilon = 1.192093e-07

but this one:

// no double literals
while ((float) (1 + floatEps / 2) != 1)
    floatEps /= 2;

leads again to a wrong result, as the first one:

float epsilon = 1.084202e-19

These last two versions should be equivalent on this platform, is this a compiler bug? If not, what's happening?

Code is compiled with:

gcc -O0 -std=c99 -pedantic file.c

The gcc version is pretty old, but I'm at university and I can't update it:

$ gcc -v
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.4.5-8'
--with-bugurl=file:///usr/share/doc/gcc-4.4/README.Bugs
--enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.4
--enable-shared --enable-multiarch --enable-linker-build-id --with-system-zlib
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--with-gxx-include-dir=/usr/include/c++/4.4 --libdir=/usr/lib --enable-nls
--enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc
--enable-targets=all --with-arch-32=i586 --with-tune=generic
--enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu
--target=i486-linux-gnu
Thread model: posix
gcc version 4.4.5 (Debian 4.4.5-8)

Current version of gcc, 4.7, behaves correctly on my home computer. There are also comments saying that different versions give different results.

After some answers and comments, that clarified what is behaving as expected and what's not, I changed the question a little to make it clearer.

like image 675
effeffe Avatar asked Apr 17 '13 15:04

effeffe


1 Answers

The compiler is allowed to evaluate float expressions in any bigger precision it likes, so it looks like the first expression is evaluated in long double precision. In the second expression you enforce scaling the result down to float again.

In answer to some of your additional questions and the discussion below: you are basically looking for the smallest non-zero difference with 1 of some floating point type. Depending on the setting of FLT_EVAL_METHOD a compiler may decide to evaluate all floating point expressions in a higher precision than the types involved. On a Pentium traditionally the internal registers of the floating point unit are 80 bits and it is convenient to use that precision for all the smaller floating point types. So in the end your test depends on the precision of your compare !=. In the absence of an explicit cast the precision of this comparison is determined by your compiler not by your code. With the explicit cast you scale the comparison down to the type you desire.

As you confirmed your compiler has set FLT_EVAL_METHOD to 2 so it uses the highest precision for any floating point calculation.

As a conclusion to the discussion below we are confident to say that there is a bug relating to implementation of the FLT_EVAL_METHOD=2 case in gcc prior to version 4.5 and that is fixed from of at least version 4.6. If the integer constant 2 is used in the expression instead of the floating point constant 2.0, the cast to float is omitted in the generated assembly. It is also worth noticing that from of optimization level -O1 the right results are produced on these older compilers, but the generated assembly is quite different and contains only few floating point operations.

like image 124
Bryan Olivier Avatar answered Sep 18 '22 23:09

Bryan Olivier