In C programming, I find a weird problem, which counters my intuition. When I declare a <code>integer</code> as the <code>INT_MAX</code> (<code>2147483647</code>, defined in the limits.h) and implicitly convert it to a <code>float</code> value, it works fine, i.e., the float value is same with the maximum integer. And then, I convert the float back to an integer, something interesting happens. The new <code>integer</code> becomes the minimum integer (<code>-2147483648</code>). The source codes look as below: <pre class="prettyprint"><code>int a = INT_MAX; float b = a; // b is correct int a_new = b; // a_new becomes INT_MIN </code></pre> I am not sure what happens when the float number <code>b</code> is converted to the integer <code>a_new</code>. So, is there any reasonable solution to find the maximum value which can be switched forth and back between <code>integer</code> and <code>float</code> type? PS: The value of <code>INT_MAX</code> - 100 works fine, but this is just an arbitrary workaround.

This answer assumes that <code>float</code> is an IEEE-754 single precision float encoded as 32-bits, and that an <code>int</code> is 32-bits. See this Wikipedia article for more information about IEEE-754. <hr> Floating point numbers only have 24-bits of precision, compared with 32-bits for an int. Therefore int values from 0 to 16777215 have an exact representation as floating point numbers, but numbers greater than 16777215 do not necessarily have exact representations as floats. The following code demonstrates this fact (on systems that use IEEE-754). <pre class="prettyprint"><code>for ( int a = 16777210; a < 16777224; a++ ) { float b = a; int c = b; printf( "a=%d c=%d b=0x%08x\n", a, c, *((int*)&b) ); } </code></pre> The expected output is <pre class="prettyprint"><code>a=16777210 c=16777210 b=0x4b7ffffa a=16777211 c=16777211 b=0x4b7ffffb a=16777212 c=16777212 b=0x4b7ffffc a=16777213 c=16777213 b=0x4b7ffffd a=16777214 c=16777214 b=0x4b7ffffe a=16777215 c=16777215 b=0x4b7fffff a=16777216 c=16777216 b=0x4b800000 a=16777217 c=16777216 b=0x4b800000 a=16777218 c=16777218 b=0x4b800001 a=16777219 c=16777220 b=0x4b800002 a=16777220 c=16777220 b=0x4b800002 a=16777221 c=16777220 b=0x4b800002 a=16777222 c=16777222 b=0x4b800003 a=16777223 c=16777224 b=0x4b800004 </code></pre> Of interest here is that the <code>float</code> value 0x4b800002 is used to represent the three <code>int</code> values 16777219, 16777220, and 16777221, and thus converting 16777219 to a <code>float</code> and back to an <code>int</code> does not preserve the exact value of the <code>int</code>. <hr> The two floating point values that are closest to <code>INT_MAX</code> are 2147483520 and 2147483648, which can be demonstrated with this code <pre class="prettyprint"><code>for ( int a = 2147483520; a < 2147483647; a++ ) { float b = a; int c = b; printf( "a=%d c=%d b=0x%08x\n", a, c, *((int*)&b) ); } </code></pre> The interesting parts of the output are <pre class="prettyprint"><code>a=2147483520 c=2147483520 b=0x4effffff a=2147483521 c=2147483520 b=0x4effffff ... a=2147483582 c=2147483520 b=0x4effffff a=2147483583 c=2147483520 b=0x4effffff a=2147483584 c=-2147483648 b=0x4f000000 a=2147483585 c=-2147483648 b=0x4f000000 ... a=2147483645 c=-2147483648 b=0x4f000000 a=2147483646 c=-2147483648 b=0x4f000000 </code></pre> Note that all 32-bit <code>int</code> values from 2147483584 to 2147483647 will be rounded up to a <code>float</code> value of 2147483648. The largest <code>int</code> value that will round down is 2147483583, which the same as <code>(INT_MAX - 64)</code> on a 32-bit system. One might conclude therefore that numbers below <code>(INT_MAX - 64)</code> will safely convert from <code>int</code> to <code>float</code> and back to <code>int</code>. But that is only true on systems where the size of an <code>int</code> is 32-bits, and a <code>float</code> is encoded per IEEE-754.

Convert INT_MAX to float and then back to integer.

Tags:

c

type-conversion

integer-overflow

In C programming, I find a weird problem, which counters my intuition. When I declare a integer as the INT_MAX (2147483647, defined in the limits.h) and implicitly convert it to a float value, it works fine, i.e., the float value is same with the maximum integer. And then, I convert the float back to an integer, something interesting happens. The new integer becomes the minimum integer (-2147483648).
The source codes look as below:

int a = INT_MAX;
float b = a; // b is correct
int a_new = b; // a_new becomes INT_MIN

I am not sure what happens when the float number b is converted to the integer a_new. So, is there any reasonable solution to find the maximum value which can be switched forth and back between integer and float type?

PS: The value of INT_MAX - 100 works fine, but this is just an arbitrary workaround.

562

asked May 02 '14 04:05

houtoms

1 Answers

This answer assumes that float is an IEEE-754 single precision float encoded as 32-bits, and that an int is 32-bits. See this Wikipedia article for more information about IEEE-754.

Floating point numbers only have 24-bits of precision, compared with 32-bits for an int. Therefore int values from 0 to 16777215 have an exact representation as floating point numbers, but numbers greater than 16777215 do not necessarily have exact representations as floats. The following code demonstrates this fact (on systems that use IEEE-754).

for ( int a = 16777210; a < 16777224; a++ )
{
    float b = a;
    int c = b;
    printf( "a=%d c=%d b=0x%08x\n", a, c, *((int*)&b) );
}

The expected output is

a=16777210 c=16777210 b=0x4b7ffffa
a=16777211 c=16777211 b=0x4b7ffffb
a=16777212 c=16777212 b=0x4b7ffffc
a=16777213 c=16777213 b=0x4b7ffffd
a=16777214 c=16777214 b=0x4b7ffffe
a=16777215 c=16777215 b=0x4b7fffff
a=16777216 c=16777216 b=0x4b800000
a=16777217 c=16777216 b=0x4b800000
a=16777218 c=16777218 b=0x4b800001
a=16777219 c=16777220 b=0x4b800002
a=16777220 c=16777220 b=0x4b800002
a=16777221 c=16777220 b=0x4b800002
a=16777222 c=16777222 b=0x4b800003
a=16777223 c=16777224 b=0x4b800004

Of interest here is that the float value 0x4b800002 is used to represent the three int values 16777219, 16777220, and 16777221, and thus converting 16777219 to a float and back to an int does not preserve the exact value of the int.

The two floating point values that are closest to INT_MAX are 2147483520 and 2147483648, which can be demonstrated with this code

for ( int a = 2147483520; a < 2147483647; a++ )
{
    float b = a;
    int c = b;
    printf( "a=%d c=%d b=0x%08x\n", a, c, *((int*)&b) );
}

The interesting parts of the output are

a=2147483520 c=2147483520 b=0x4effffff
a=2147483521 c=2147483520 b=0x4effffff
...
a=2147483582 c=2147483520 b=0x4effffff
a=2147483583 c=2147483520 b=0x4effffff
a=2147483584 c=-2147483648 b=0x4f000000
a=2147483585 c=-2147483648 b=0x4f000000
...
a=2147483645 c=-2147483648 b=0x4f000000
a=2147483646 c=-2147483648 b=0x4f000000

Note that all 32-bit int values from 2147483584 to 2147483647 will be rounded up to a float value of 2147483648. The largest int value that will round down is 2147483583, which the same as (INT_MAX - 64) on a 32-bit system.

One might conclude therefore that numbers below (INT_MAX - 64) will safely convert from int to float and back to int. But that is only true on systems where the size of an int is 32-bits, and a float is encoded per IEEE-754.

106

answered Oct 03 '22 21:10

user3386109

Related questions
                            
                                Alpha Blending 2 RGBA colors in C [duplicate]
                            
                                How to check `typeof` for void value at compile time?
                            
                                find smallest area that contains all the rectangles
                            
                                C 'Volatile' keyword in ISR and multithreaded program?
                            
                                Is operation of getting id of current thread time expensive? [duplicate]
                            
                                PInvoke char* in C DLL handled as String in C#. Issue with null characters
                            
                                Explanation of sigsuspend needed
                            
                                list of all header files included by a C file
                            
                                pointer comparisons “>” with one before the first element of an array object
                            
                                Can a timespec have more than 1 seconds worth of nanoseconds in the tv_nsec field?
                            
                                Is it possible to get the time (of the day) and date at time of compilation?
                            
                                Variable declaration after function's argument list [duplicate]
                            
                                What does the following piece of code do
                            
                                How can WolframAlpha exponentiate numbers so quickly?
                            
                                Is there a preprocessor macro to detect C99 across platforms?
                            
                                How can I print the symbolic name of an errno in C?
                            
                                Creating a simple table with Lua tables C API
                            
                                PostgreSQL Error: The program can't start because libpq.dll is missing from your computer
                            
                                What is the difference between memory, buffer and stack? [duplicate]
                            
                                Is chained assignment in C/C++ undefined behavior?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With