Could someone explain this weird looking output on a 32 bit machine? <pre class="prettyprint"><code>#include <stdio.h> int main() { printf("16777217 as float is %.1f\n",(float)16777217); printf("16777219 as float is %.1f\n",(float)16777219); return 0; } </code></pre> Output <pre class="prettyprint"><code>16777217 as float is 16777216.0 16777219 as float is 16777220.0 </code></pre> The weird thing is that 16777217 casts to a lower value and 16777219 casts to a higher value...

In the IEEE-754 basic 32-bit binary floating-point format, all integers from −16,777,216 to +16,777,216 are representable. From 16,777,216 to 33,554,432, only even integers are representable. Then, from 33,554,432 to 67,108,864, only multiples of four are representable. (Since the question does not necessitate discussion of which numbers are representable, I will omit explanation and just take this for granted.) The most common default rounding mode is to round the exact mathematical result to the nearest representable value and, in case of a tie, to round to the representable value which has zero in the low bit of its significand. 16,777,217 is equidistant between the two representable values 16,777,216 and 16,777,218. These values are represented as 1000000000000000000000002•21 and 1000000000000000000000012•21. The former has 0 in the low bit of its significand, so it is chosen as the result. 16,777,219 is equidistant between the two representable values 16,777,218 and 16,777,220. These values are represented as 1000000000000000000000012•21 and 1000000000000000000000102•21. The latter has 0 in the low bit of its significand, so it is chosen as the result.

Understanding casts from integer to float

Tags:

c

int

floating-point

precision

floating-point-conversion

Could someone explain this weird looking output on a 32 bit machine?

#include <stdio.h>

int main() {
  printf("16777217 as float is %.1f\n",(float)16777217);
  printf("16777219 as float is %.1f\n",(float)16777219);

  return 0;
}

Output

16777217 as float is 16777216.0
16777219 as float is 16777220.0

The weird thing is that 16777217 casts to a lower value and 16777219 casts to a higher value...

839

asked May 13 '18 18:05

zzz_zzz

2 Answers

In the IEEE-754 basic 32-bit binary floating-point format, all integers from −16,777,216 to +16,777,216 are representable. From 16,777,216 to 33,554,432, only even integers are representable. Then, from 33,554,432 to 67,108,864, only multiples of four are representable. (Since the question does not necessitate discussion of which numbers are representable, I will omit explanation and just take this for granted.)

The most common default rounding mode is to round the exact mathematical result to the nearest representable value and, in case of a tie, to round to the representable value which has zero in the low bit of its significand.

16,777,217 is equidistant between the two representable values 16,777,216 and 16,777,218. These values are represented as 100000000000000000000000₂•2¹ and 100000000000000000000001₂•2¹. The former has 0 in the low bit of its significand, so it is chosen as the result.

16,777,219 is equidistant between the two representable values 16,777,218 and 16,777,220. These values are represented as 100000000000000000000001₂•2¹ and 100000000000000000000010₂•2¹. The latter has 0 in the low bit of its significand, so it is chosen as the result.

answered Oct 03 '22 01:10

Eric Postpischil

You may have heard of the concept of "precision", as in "this fractional representation has 3 digits of precision".

This is very easy to think about in a fixed-point representation. If I have, say, three digits of precision past the decimal, then I can exactly represent 1/2 = 0.5, and I can exactly represent 1/4 = 0.25, and I can exactly represent 1/8 = 0.125, but if I try to represent 1/16, I can not get 0.0625; I will either have to settle for 0.062 or 0.063.

But that's for fixed-point. The computer you're using uses floating-point, which is a lot like scientific notation. You get a certain number of significant digits total, not just digits to the right of the decimal point. For example, if you have 3 decimal digits worth of precision in a floating-point format, you can represent 0.123 but not 0.1234, and you can represent 0.0123 and 0.00123, but not 0.01234 or 0.001234. And if you have digits to the left of the decimal point, those take away away from the number you can use to the right of the decimal point. You can use 1.23 but not 1.234, and 12.3 but not 12.34, and 123.0 but not 123.4 or 123.anythingelse.

And -- you can probably see the pattern by now -- if you're using a floating-point format with only three significant digits, you can't represent all numbers greater than 999 perfectly accurately at all, even though they don't have a fractional part. You can represent 1230 but not 1234, and 12300 but not 12340.

So that's decimal floating-point formats. Your computer, on the other hand, uses a binary floating-point format, which ends up being somewhat trickier to think about. We don't have an exact number of decimal digits' worth of precision, and the numbers that can't be exactly represented don't end up being nice even multiples of 10 or 100.

In particular, type float on most machines has 24 binary bits worth of precision, which works out to 6-7 decimal digits' worth of precision. That's obviously not enough for numbers like 16777217.

So where did the numbers 16777216 and 16777220 come from? As Eric Postpischil has already explained, it ends up being because they're multiples of 2. If we look at the binary representations of nearby numbers, the pattern becomes clear:

16777208     111111111111111111111000
16777209     111111111111111111111001
16777210     111111111111111111111010
16777211     111111111111111111111011
16777212     111111111111111111111100
16777213     111111111111111111111101
16777214     111111111111111111111110
16777215     111111111111111111111111
16777216    1000000000000000000000000
16777218    1000000000000000000000010
16777220    1000000000000000000000100

16777215 is the biggest number that can be represented exactly in 24 bits. After that, you can represent only even numbers, because the low-order bit is the 25th, and essentially has to be 0.

answered Oct 03 '22 01:10

Steve Summit

Related questions
                            
                                How to overwrite a file in C?
                            
                                Unexpected C/C++ bitwise shift operators outcome
                            
                                C order of operations -- foo() + bar() -- must foo be called before bar?
                            
                                pthread_create not working. passing argument 3 warning
                            
                                Getting the size of a malloc only with the returned pointer
                            
                                void pointer as argument [duplicate]
                            
                                How to use a static C variable across multiple files?
                            
                                Why is the minimum value of int 1 farther from zero than the positive value?
                            
                                Why is my python/numpy example faster than pure C implementation?
                            
                                Is it possible to print non-printing characters with a %C specifier?
                            
                                Understanding static storage class in C
                            
                                `volatile` to sync variable between threads
                            
                                Is it possible to include C++ libraries in C programs?
                            
                                how many ways we can create a process in linux using c
                            
                                C: Most efficient way to set all bits in a range within a variable
                            
                                Understanding char *, char[] and strcpy()
                            
                                Function that returns a function pointer syntax
                            
                                How does the compare function in qsort work?
                            
                                scanf("%c") call seems to be skipped
                            
                                How to sum all command line arguments in C?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With