1. Question:
I have a question about the DBL_MAX
and DBL_MIN
definition in Linux with gcc v4.8.5.
They are defined in limit.h
as:
#define DBL_MAX __DBL_MAX__
#define DBL_MIN __DBL_MIN__
where __DBL_MIN__
and __DBL_MAX__
are compiler specific and can be obtained by:
$ gcc -dM -E - < /dev/null
...
#define __DBL_MAX__ ((double)1.79769313486231570815e+308L)
#define __DBL_MIN__ ((double)2.22507385850720138309e-308L)
...
My question is:
Why are the values defined as long double
with suffix L
and then casted back to a double
?
2. Question:
Why is the __DBL_MIN_10_EXP__
defined with -307
but the minimum exponent is -308
as it is used above in the DBL_MIN
macro? In the case of the maximum exponent it is defined with 308
which I can understand as it is used by the DBL_MAX
macro.
#define __DBL_MAX_10_EXP__ 308
#define __DBL_MIN_10_EXP__ (-307)
Not part of the question, just observations I made:
By the way using Windows with Visual Studio 2015 there are just the DBL_MAX
and DBL_MIN
macros defined without the compiler specific redirection to the versions with the underscore. Further the minimum positive double value DBL_MIN
and the maximum double value DBL_MAX
are a little bit greater than the values from my Linux gcc compiler (just compared to the defined macros from gcc v4.8.5 above):
#define DBL_MAX 1.7976931348623158e+308
#define DBL_MIN 2.2250738585072014e–308
Moreover the Microsoft compiler set the long double
limits to the values of a double
, seems that it doesn't support a real long double
implementation.
To represent floating point numbers, we use float, double and long double. What's the difference? double has 2x more precision than float. float is a 32-bit IEEE 754 single precision Floating Point Number – 1 bit for the sign, 8 bits for the exponent, and 23* for the value. float has 7 decimal digits of precision.
Explanation: The floating point data types are called real data types. Hence float, double, and long double are real data types.
In C and related programming languages, long double refers to a floating-point data type that is often more precise than double precision though the language standard only requires it to be at least as precise as double . As with C's other floating-point types, it may not necessarily map to an IEEE format.
sizeof(long double) is 16 (aka 128 bits) in Intel Macs for alignment purposes but is actually 80 bit precision according to their documentation.
Specifying binary floating point numbers in decimal has subtle issues.
Why are the values defined as long double with suffix L and then casted back to a double?
With typical binary64, the maximum finite value is about 1.795e+308
or exactly.
179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368
The numbers of digits needed to convert to a unique double
may be as many as DBL_DECIMAL_DIG
(typically 17 and at least 10). In any case, using exponential notation is certainly clear without being overly precise.
/*
1 2345678901234567 */ // Sorted
1.79769313486231550856124... // DBL_MAX next smallest for reference
1.79769313486231570814527... // Exact
1.79769313486231570815e+308L // gcc
1.7976931348623158e+308 // VS (just a hair closer to exact than "next largerst")
1.7976931348623159077293.... // DBL_MAX next largest if not limited by range
Various compilers may not convert this string exactly as hoped. Sometimes ignoring some least significant digits - although this is controlled by the compiler.
Another source of subtle conversion differences, and I expect this is why the 'L' is added, the double
computation is affected by the processor's floating point unit which might not have exact adherence to IEEE standards. The worse result could be that the 1.797...e+308
constant converts to infinity due to minute conversion errors the "code to a double
" using double math. By converting to a long double
, those long double
conversion errors are very small. Then converting the long double
result to double
rounds to the hoped for number.
In short, forcing L
math insures the constant is not inadvertently made an infinity.
I would expect the following which matches neither gcc nor VS to be sufficient with a compliant IEEE 754 standard FPU.
#define __DBL_MAX__ 1.7976931348623157e+308
The cast back to double
is to make DBL_MAX
a double
. This would meet many code's expectations that a DBL_MAX
is a double
and not a long double
. I see no specification that requires this though.
Why is the DBL_MIN_10_EXP defined with -307 but the minimum exponent is -308?
That is to comply with the definition of DBL_MIN_10_EXP
. "... minimum negative integer such that 10 raised to that power is in the range of normalized floating-point numbers" The non-integer answer is between -307 and -308, so the minimum integer in range is -307.
observation part
Although VS treats long double
as a distinct type, the same encoding as double
is used, thus there is no numeric advantage in using L
.
I don't know why the L suffix is used.
This site has an overview of IEEE 754 floating point.
The exponent is 11 bits, with an offset of 1023. However exponents of 0 and 2047 are reserved for special numbers. So this means that the exponent can vary from 2046-1023=1023 to 1-1023=-1022.
So for the max normalized value we have an exponent of 2^1023. The max value for the mantissa is just below 2 (1.111 etc with 52 1s after the point, in binary) which is ~2*2^1023 = ~1.79e308.
For the min normalized value we have an exponent of 2^-1022. The min mantissa is exactly 1 giving us a value of 1*2^-1022 = ~2.22e-308. So far so good.
DBL_MIN_10_EXP and DBL_MAX_10_EXP are the min/max exponents of 10 that are normalized. For the max 1e308 is less than ~1.79e308 so the value is 308. For the min, 1e-308 is too small - it is lower than ~2.22e-308. 1e-307 is greater than ~2.22e-308 so the value is -307.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With