The constants in <code><float.h></code> for <code>Apple clang version 12.0.0 (clang-1200.0.32.2)</code> don't seem to make sense. <code>DBL_MIN_EXP</code> is <code>-1021</code> and <code>DBL_MAX_EXP</code> is <code>1024</code>. However, that doesn't match what wikipedia says, "exponents range from −1022 to +1023, ..." Also <code>DBL_MIN_EXP</code> seems inconsistent with <code>DBL_MIN</code> which is <code>2.2250738585072014e-308</code> which is equal to <code>2⁻¹⁰²²</code> sometimes written as <code>0x1.0000000000000p-1022</code>. So, we have an exponent smaller than the minimum <code>-1021</code>. Likewise, <code>DBL_MIN_10_EXP</code> is <code>-307</code> which doesn't sense given that <code>DBL_MIN</code> has an exponent of <code>e-308</code>. The double <code>DBL_MAX_EXP</code> value of <code>1024</code> overflows when used in real code. For example, <code>ldexp(1.0, 1024)</code> gives <code>inf</code>. Here's my C code: <pre class="prettyprint"><code>#include <float.h> #include <stdio.h> #include <math.h> #define SHOW_DOUBLE(s) printf("%.17lg \t%s\n", s, #s); #define SHOW_INT(s) printf("%d \t%s\n", s, #s); int main() { SHOW_DOUBLE(DBL_MAX); SHOW_DOUBLE(DBL_MIN); SHOW_DOUBLE(DBL_EPSILON); SHOW_INT(DBL_MAX_EXP); SHOW_INT(DBL_MAX_10_EXP); SHOW_INT(DBL_MIN_EXP); SHOW_INT(DBL_MIN_10_EXP); SHOW_INT(DBL_DIG); SHOW_INT(DBL_MANT_DIG); SHOW_INT(FLT_RADIX); SHOW_INT(FLT_ROUNDS); printf("%lf\n", ldexp(1.0, 1024)); return 0; } </code></pre> And here is the output: <pre class="prettyprint"><code>1.7976931348623157e+308 DBL_MAX 2.2250738585072014e-308 DBL_MIN 2.2204460492503131e-16 DBL_EPSILON 1024 DBL_MAX_EXP 308 DBL_MAX_10_EXP -1021 DBL_MIN_EXP -307 DBL_MIN_10_EXP 15 DBL_DIG 53 DBL_MANT_DIG 2 FLT_RADIX 1 FLT_ROUNDS inf </code></pre>

The off-by-one is part of the spec. From 5.2.4.2.2 Characteristics of floating types <float.h>, ¶11, <blockquote> ... <ul> <li>minimum negative integer such that FLT_RADIX raised to one less than that power is a normalized floating-point number, emin <ul> <li>FLT_MIN_EXP</li> <li>DBL_MIN_EXP</li> <li>LDBL_MIN_EXP</li> </ul> </li> </ul> ... <ul> <li>maximum integer such that FLT_RADIX raised to one less than that power is a representable finite floating-point number, emax <ul> <li>FLT_MAX_EXP</li> <li>DBL_MAX_EXP</li> <li>LDBL_MAX_EXP</li> </ul> </li> </ul> </blockquote> Emphasis on one less than is mine.

I asked myself the same question and realized that this stems from the fact that IEEE 754 and C use two different normalized forms: <ul> <li>IEEE 754 uses a significand between 1 (inclusive) and 2 (exclusive) for normal numbers:</li> </ul> <blockquote> x = sign × significand × 2exponent, encoded as S|E|F where sign = (−1)S; significand = <ul> <li>1.F if Emin ≤ E ≤ Emax (normal number),</li> <li>0 if E = Emin − 1 and F = 0 (zero),</li> <li>0.F if E = Emin − 1 and F ≠ 0 (subnormal number),</li> <li>∞ if E = Emax + 1 and F = 0 (infinity),</li> <li>NaN if E = Emax + 1 and F ≠ 0 (not a number);</li> </ul> exponent = max(E, Emin) − exponent bias. </blockquote> <ul> <li>C uses a significand′ between 0.5 (inclusive) and 1 (exclusive) for normal numbers:</li> </ul> <blockquote> x = sign′ × significand′ × 2exponent′, encoded as S|E|F where sign′ = (−1)S; significand′ = <ul> <li>0.1F if Emin ≤ E ≤ Emax (normal number),</li> <li>0 if E = Emin − 1 and F = 0 (zero),</li> <li>0.0F if E = Emin − 1 and F ≠ 0 (subnormal number),</li> <li>∞ if E = Emax + 1 and F = 0 (infinity),</li> <li>NaN if E = Emax + 1 and F ≠ 0 (not a number);</li> </ul> exponent′ = max(E + 1, Emin + 1) − exponent bias. </blockquote> Since x = sign × significand × 2exponent = sign′ × significand′ × 2exponent′, the following relations between the IEEE 754 normalized form and the C normalized form hold: <ul> <li> sign′ = sign;</li> <li> significand′ = significand/2;</li> <li> exponent′ = exponent + 1.</li> </ul> In particular, in binary64 format that is why exponentmin = −1022 in IEEE 754 normalized form and exponent′min = −1021 in C normalized form (<code>DBL_MIN_EXP</code>), and exponentmax = 1023 in IEEE 754 normalized form and exponent′max = 1024 in C normalized form (<code>DBL_MAX_EXP</code>). In Python: <ul> <li>the IEEE 754 normalized significand and exponent can be extracted with:</li> </ul> <pre class="prettyprint lang-py prettyprint-override"><code>math.ldexp(x, -math.floor(math.log2(x))), math.floor(math.log2(x)) </code></pre> <ul> <li>the C normalized significand′ and exponent′ can be extracted with:</li> </ul> <pre class="prettyprint lang-py prettyprint-override"><code>math.frexp(x) </code></pre> For instance: <pre class="prettyprint lang-py prettyprint-override"><code>>>> import math >>> x = 12 >>> math.ldexp(x, -math.floor(math.log2(x))), math.floor(math.log2(x)) (1.5, 3) >>> math.frexp(12) (0.75, 4) </code></pre>

Some C Floating Point Constants Don't Make Sense

Tags:

c

floating-point

constants

ieee-754

The constants in <float.h> for Apple clang version 12.0.0 (clang-1200.0.32.2) don't seem to make sense.

DBL_MIN_EXP is -1021 and DBL_MAX_EXP is 1024. However, that doesn't match what wikipedia says, "exponents range from −1022 to +1023, ..."

Also DBL_MIN_EXP seems inconsistent with DBL_MIN which is 2.2250738585072014e-308 which is equal to 2⁻¹⁰²² sometimes written as 0x1.0000000000000p-1022. So, we have an exponent smaller than the minimum -1021.

Likewise, DBL_MIN_10_EXP is -307 which doesn't sense given that DBL_MIN has an exponent of e-308.

The double DBL_MAX_EXP value of 1024 overflows when used in real code. For example, ldexp(1.0, 1024) gives inf.

Here's my C code:

#include <float.h>
#include <stdio.h>
#include <math.h>

#define SHOW_DOUBLE(s)   printf("%.17lg \t%s\n", s, #s);
#define SHOW_INT(s)      printf("%d \t%s\n", s, #s);

int
main()
{
    SHOW_DOUBLE(DBL_MAX);
    SHOW_DOUBLE(DBL_MIN);
    SHOW_DOUBLE(DBL_EPSILON);
    SHOW_INT(DBL_MAX_EXP);
    SHOW_INT(DBL_MAX_10_EXP);
    SHOW_INT(DBL_MIN_EXP);
    SHOW_INT(DBL_MIN_10_EXP);
    SHOW_INT(DBL_DIG);
    SHOW_INT(DBL_MANT_DIG);
    SHOW_INT(FLT_RADIX);
    SHOW_INT(FLT_ROUNDS);
    printf("%lf\n", ldexp(1.0, 1024));
    return 0;
}

And here is the output:

1.7976931348623157e+308 DBL_MAX
2.2250738585072014e-308 DBL_MIN
2.2204460492503131e-16  DBL_EPSILON
1024                    DBL_MAX_EXP
308                     DBL_MAX_10_EXP
-1021                   DBL_MIN_EXP
-307                    DBL_MIN_10_EXP
15                      DBL_DIG
53                      DBL_MANT_DIG
2                       FLT_RADIX
1                       FLT_ROUNDS
inf

381

asked Oct 16 '20 20:10

Raymond Hettinger

2 Answers

The off-by-one is part of the spec. From 5.2.4.2.2 Characteristics of floating types <float.h>, ¶11,

...

minimum negative integer such that FLT_RADIX raised to one less than that power is a normalized floating-point number, emin

FLT_MIN_EXP

DBL_MIN_EXP

LDBL_MIN_EXP

...

maximum integer such that FLT_RADIX raised to one less than that power is a representable finite floating-point number, emax

FLT_MAX_EXP

DBL_MAX_EXP

LDBL_MAX_EXP

Emphasis on one less than is mine.

186

answered Oct 14 '22 05:10

R.. GitHub STOP HELPING ICE

I asked myself the same question and realized that this stems from the fact that IEEE 754 and C use two different normalized forms:

IEEE 754 uses a significand between 1 (inclusive) and 2 (exclusive) for normal numbers:

x = sign × significand × 2^exponent, encoded as S|E|F where sign = (−1)^S; significand =

1.F if E_min ≤ E ≤ E_max (normal number),

0 if E = E_min − 1 and F = 0 (zero),

0.F if E = E_min − 1 and F ≠ 0 (subnormal number),

∞ if E = E_max + 1 and F = 0 (infinity),

NaN if E = E_max + 1 and F ≠ 0 (not a number);

exponent = max(E, E_min) − exponent bias.

C uses a significand′ between 0.5 (inclusive) and 1 (exclusive) for normal numbers:

x = sign′ × significand′ × 2^exponent′, encoded as S|E|F where sign′ = (−1)^S; significand′ =

0.1F if E_min ≤ E ≤ E_max (normal number),

0 if E = E_min − 1 and F = 0 (zero),

0.0F if E = E_min − 1 and F ≠ 0 (subnormal number),

∞ if E = E_max + 1 and F = 0 (infinity),

NaN if E = E_max + 1 and F ≠ 0 (not a number);

exponent′ = max(E + 1, E_min + 1) − exponent bias.

Since x = sign × significand × 2^exponent = sign′ × significand′ × 2^exponent′, the following relations between the IEEE 754 normalized form and the C normalized form hold:

sign′ = sign;
significand′ = significand/2;
exponent′ = exponent + 1.

In particular, in binary64 format that is why exponent_min = −1022 in IEEE 754 normalized form and exponent′_min = −1021 in C normalized form (DBL_MIN_EXP), and exponent_max = 1023 in IEEE 754 normalized form and exponent′_max = 1024 in C normalized form (DBL_MAX_EXP).

In Python:

the IEEE 754 normalized significand and exponent can be extracted with:

math.ldexp(x, -math.floor(math.log2(x))), math.floor(math.log2(x))

the C normalized significand′ and exponent′ can be extracted with:

math.frexp(x)

For instance:

>>> import math
>>> x = 12
>>> math.ldexp(x, -math.floor(math.log2(x))), math.floor(math.log2(x))
(1.5, 3)
>>> math.frexp(12)
(0.75, 4)

answered Oct 14 '22 05:10

Maggyero

Related questions
                            
                                Function overloading in C using GCC - compiler warnings
                            
                                How to properly initialize a const int const * variable?
                            
                                Will the compiler optimize functions which return structures with fixed size arrays?
                            
                                SSE: unaligned load and store that crosses page boundary
                            
                                difference between %ms and %s scanf
                            
                                How is the initial value of x87 floating point control word defined?
                            
                                Can you compile a shared object to prefer local symbols even if it's being loaded by a program compiled with -rdynamic?
                            
                                Is it common practice to "abuse" loops as goto [closed]
                            
                                Divide a signed integer by a power of 2
                            
                                Behavior of Float overflow in C
                            
                                In C, How do I calculate the signed difference between two 48-bit unsigned integers?
                            
                                How do I interpret this declaration that appears to be a function declaration, but doesn't fit the usual mould?
                            
                                Cython: Compile a Standalone Static Executable
                            
                                Different signal handlers for parent and child
                            
                                Struct vs string literals? Read only vs read-write? [duplicate]
                            
                                Function Prefix vs "Function Struct" in C
                            
                                Why is returning a stack allocated pointer variable in a function allowed in C?
                            
                                Is there a way to declare a function argument to take an anonymous enum?
                            
                                Can integer division ever over/underflow, assuming the denominator <>0? [duplicate]
                            
                                Initializing struct containing arrays

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With