Floating point limits (double) defined with long double suffix L

Tags:

gcc

1. Question:

I have a question about the DBL_MAX and DBL_MIN definition in Linux with gcc v4.8.5.
They are defined in limit.h as:

#define DBL_MAX     __DBL_MAX__
#define DBL_MIN     __DBL_MIN__

where __DBL_MIN__ and __DBL_MAX__ are compiler specific and can be obtained by:

$ gcc -dM -E - < /dev/null
...
#define __DBL_MAX__ ((double)1.79769313486231570815e+308L)
#define __DBL_MIN__ ((double)2.22507385850720138309e-308L)
...

My question is:
Why are the values defined as long double with suffix L and then casted back to a double?

2. Question:

Why is the __DBL_MIN_10_EXP__ defined with -307 but the minimum exponent is -308 as it is used above in the DBL_MIN macro? In the case of the maximum exponent it is defined with 308 which I can understand as it is used by the DBL_MAX macro.

#define __DBL_MAX_10_EXP__ 308
#define __DBL_MIN_10_EXP__ (-307)

Not part of the question, just observations I made:

By the way using Windows with Visual Studio 2015 there are just the DBL_MAX and DBL_MIN macros defined without the compiler specific redirection to the versions with the underscore. Further the minimum positive double value DBL_MIN and the maximum double value DBL_MAX are a little bit greater than the values from my Linux gcc compiler (just compared to the defined macros from gcc v4.8.5 above):

#define DBL_MAX        1.7976931348623158e+308
#define DBL_MIN        2.2250738585072014e–308

Moreover the Microsoft compiler set the long double limits to the values of a double, seems that it doesn't support a real long double implementation.

662

asked Jun 28 '17 12:06

Andre Kampling

Video Answer

2 Answers

Specifying binary floating point numbers in decimal has subtle issues.

Why are the values defined as long double with suffix L and then casted back to a double?

With typical binary64, the maximum finite value is about 1.795e+308 or exactly.

179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368

The numbers of digits needed to convert to a unique double may be as many as DBL_DECIMAL_DIG (typically 17 and at least 10). In any case, using exponential notation is certainly clear without being overly precise.

/*
1 2345678901234567 */          // Sorted 
1.79769313486231550856124...   // DBL_MAX next smallest for reference
1.79769313486231570814527...   // Exact
1.79769313486231570815e+308L   // gcc
1.7976931348623158e+308        // VS (just a hair closer to exact than "next largerst")
1.7976931348623159077293....   // DBL_MAX next largest if not limited by range

Various compilers may not convert this string exactly as hoped. Sometimes ignoring some least significant digits - although this is controlled by the compiler.

Another source of subtle conversion differences, and I expect this is why the 'L' is added, the double computation is affected by the processor's floating point unit which might not have exact adherence to IEEE standards. The worse result could be that the 1.797...e+308 constant converts to infinity due to minute conversion errors the "code to a double" using double math. By converting to a long double, those long double conversion errors are very small. Then converting the long double result to double rounds to the hoped for number.

In short, forcing L math insures the constant is not inadvertently made an infinity.

I would expect the following which matches neither gcc nor VS to be sufficient with a compliant IEEE 754 standard FPU.

#define __DBL_MAX__ 1.7976931348623157e+308

The cast back to double is to make DBL_MAX a double. This would meet many code's expectations that a DBL_MAX is a double and not a long double. I see no specification that requires this though.

Why is the DBL_MIN_10_EXP defined with -307 but the minimum exponent is -308?

That is to comply with the definition of DBL_MIN_10_EXP. "... minimum negative integer such that 10 raised to that power is in the range of normalized floating-point numbers" The non-integer answer is between -307 and -308, so the minimum integer in range is -307.

observation part

Although VS treats long double as a distinct type, the same encoding as double is used, thus there is no numeric advantage in using L.

198

answered Sep 20 '22 17:09

chux - Reinstate Monica

I don't know why the L suffix is used.

This site has an overview of IEEE 754 floating point.

The exponent is 11 bits, with an offset of 1023. However exponents of 0 and 2047 are reserved for special numbers. So this means that the exponent can vary from 2046-1023=1023 to 1-1023=-1022.

So for the max normalized value we have an exponent of 2^1023. The max value for the mantissa is just below 2 (1.111 etc with 52 1s after the point, in binary) which is ~2*2^1023 = ~1.79e308.

For the min normalized value we have an exponent of 2^-1022. The min mantissa is exactly 1 giving us a value of 1*2^-1022 = ~2.22e-308. So far so good.

DBL_MIN_10_EXP and DBL_MAX_10_EXP are the min/max exponents of 10 that are normalized. For the max 1e308 is less than ~1.79e308 so the value is 308. For the min, 1e-308 is too small - it is lower than ~2.22e-308. 1e-307 is greater than ~2.22e-308 so the value is -307.

answered Sep 20 '22 17:09

Paul Floyd

Related questions
                            
                                ELF dynamic symbol table
                            
                                The -l option in gcc
                            
                                Print Unicode characters in C, using ncurses
                            
                                How do we compile kernel code in C?
                            
                                Gtest: Expected Class-Name Before '{'
                            
                                Is it ok to compile files with different gcc optimization levels for same application?
                            
                                Return the include and runtime lib directories from within Python
                            
                                How to force a running program to flush the contents of its I/O buffers to disk with external means?
                            
                                Building GCC: What are the advantages and disadvantages of bootstrap?
                            
                                Explicit type conversion (functional notation) with simple-type-specifier
                            
                                Intel Pin with C++14
                            
                                Why is clang++ linking to gcc?
                            
                                How can I convert a vector of float to short int using avx instructions?
                            
                                Trouble compiling C code: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'int'
                            
                                Benefits of using clang builtins vs standard functions
                            
                                Function definitions of built-in functions in C
                            
                                GCC Assembly "+t"
                            
                                Why doesn't GCC throw a warning in this example
                            
                                Can I stop GCC including standard library names into the global namespace?
                            
                                What is the correct way to tell the compiler that I want a variable to be always stored in a register?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With