I discovered on my x86 VM (32 bit) that the following program: <pre class="prettyprint"><code>#include <stdio.h> void foo (long double x) { int y = x; printf("(int)%Lf = %d\n", x, y); } int main () { foo(.9999999999999999999728949456878623891498136799780L); foo(.999999999999999999972894945687862389149813679978L); return 0; } </code></pre> Produces the following output: <pre class="prettyprint"><code>(int)1.000000 = 1 (int)1.000000 = 0 </code></pre> Ideone also produces this behavior. What is the compiler doing to allow this to happen? I found this constant as I was tracking down why the following program didn't produce <code>0</code> as I expected (using 19 <code>9</code>s produced the <code>0</code> I expected): <pre class="prettyprint"><code>int main () { long double x = .99999999999999999999L; /* 20 9's */ int y = x; printf("%d\n", y); return 0; } </code></pre> As I tried to compute the value at which the result switches from expected to unexpected, I arrived at the constant this question is about.

Your problem is that <code>long double</code> on your platform has insufficient precision to store the exact value 0.99999999999999999999. This means that the value of that must be converted to a representable value (this conversion happens during translation of your program, not at runtime). This conversion can generate either the nearest representable value, or the next greater or smaller representable value. The choice is implementation-defined, so your implementation should document which it is using. It seems that your implementation uses x87-style 80bit <code>long double</code>, and is rounding to the nearest value, resulting in a value of 1.0 stored in <code>x</code>. <hr> With the assumed format for <code>long double</code> (with 64 mantissa bits), the highest representable number less than 1.0 is, in hexadecimal: <pre class="prettyprint"><code>0x0.ffffffffffffffff </code></pre> The number exactly halfway between this value and the next higher representable number (1.0) is: <pre class="prettyprint"><code>0x0.ffffffffffffffff8 </code></pre> Your very long constant 0.9999999999999999999728949456878623891498136799780 is equal to: <pre class="prettyprint"><code>0x0.ffffffffffffffff7fffffffffffffffffffffffa1eb2f0b64cf31c113a8ec... </code></pre> which should obviously be rounded down if rounding to nearest, but you appear to have reached some limit of the floating point representation your compiler is using, or a rounding bug.

Why does adding 0 to the end of float literal change how it rounds (possible GCC bug)?

Tags:

c++

c

floating-point

gcc

I discovered on my x86 VM (32 bit) that the following program:

#include <stdio.h> void foo (long double x) {     int y = x;     printf("(int)%Lf = %d\n", x, y); } int main () {     foo(.9999999999999999999728949456878623891498136799780L);     foo(.999999999999999999972894945687862389149813679978L);     return 0; }

Produces the following output:

(int)1.000000 = 1 (int)1.000000 = 0

Ideone also produces this behavior.

What is the compiler doing to allow this to happen?

I found this constant as I was tracking down why the following program didn't produce 0 as I expected (using 19 9s produced the 0 I expected):

int main () {     long double x = .99999999999999999999L; /* 20 9's */     int y = x;     printf("%d\n", y);     return 0; }

As I tried to compute the value at which the result switches from expected to unexpected, I arrived at the constant this question is about.

873

asked May 30 '13 08:05

jxh

2 Answers

Your problem is that long double on your platform has insufficient precision to store the exact value 0.99999999999999999999. This means that the value of that must be converted to a representable value (this conversion happens during translation of your program, not at runtime).

This conversion can generate either the nearest representable value, or the next greater or smaller representable value. The choice is implementation-defined, so your implementation should document which it is using. It seems that your implementation uses x87-style 80bit long double, and is rounding to the nearest value, resulting in a value of 1.0 stored in x.

With the assumed format for long double (with 64 mantissa bits), the highest representable number less than 1.0 is, in hexadecimal:

0x0.ffffffffffffffff

The number exactly halfway between this value and the next higher representable number (1.0) is:

0x0.ffffffffffffffff8

Your very long constant 0.9999999999999999999728949456878623891498136799780 is equal to:

0x0.ffffffffffffffff7fffffffffffffffffffffffa1eb2f0b64cf31c113a8ec...

which should obviously be rounded down if rounding to nearest, but you appear to have reached some limit of the floating point representation your compiler is using, or a rounding bug.

182

answered Oct 05 '22 18:10

caf

Compiler uses binary numbers. Most compilers do the same thing.

According to wolframalpha, binary representation of

0.99999999999999999999

looks like this:

0.11111111111111111111111111111111111111111111111111111111111111111101000011000110101111011110011011011011011110111011100101000101010111011100001011010001001110001101011001010000110000101001111011111001111110000101010111111110100110000010001001101011001101010110110010010101101111101001110001100111101100000000100110110001100110000011000100100011000011110100001000000100001000101000111011010111111101011010010000010110011111110100100110001011001110100011100001111101011110101001000000111110010000101101001001010110010011001110111111100111101111100000111010001101101011000100110001010010001000100010110000101110100101010101001010100010001001100111111111001001101100000000010010001011110100101011101001001101001111001001000101011101001100111101110111111001101110100111000001111101101101101101110100100111101000000000111101101101001000111101100010101110011101110001110010110110111101000011110110100011000110101100011111111110111000010010001111000000000101100101000100101110100001001101000010110101000100011100000110010001110101...

That's 932 bits, and that STILL isn't enough to precisely represent your number (see dots at the end).

Which means that as long as your underlying platform uses base of 2 to store numbers, you will not be able to store exactly 0.99999999999999999999.

Because number cannot be stored precisely, it'll be rounded up or down. With 20 9s it ends up being rounded up, and with 19 9s it ends up being rounded down.

To avoid this problem, instead of doubles you'll need to use some kind of 3rd party mathematics/bignum library that stores numbers internally using decimal base (i.e. two decimal digits per byte or something) or uses fractions (ratios) instead of floating point numbers. That would solve your problem.

answered Oct 05 '22 19:10

SigTerm

Related questions
                            
                                Pre-allocating buckets in a C++ std::unordered_map
                            
                                Is it allowed to name the parameter in postfix operator ++?
                            
                                Using 'void' template arguments in C++
                            
                                Visual Studio & C++: Use filters as directories
                            
                                Are C++11 move semantics doing something new, or just making semantics clearer?
                            
                                How can I get sizeof a vector::value_type?
                            
                                Void ** a generic pointer?
                            
                                Specify cython output file
                            
                                Return with assert and comma operator
                            
                                Why does using the same cache-line from multiple threads not cause serious slowdown?
                            
                                How do Concepts differ from Interfaces?
                            
                                How to make sure a function is only called once
                            
                                How would I load a PNG image using Win32/GDI (no GDI+ if possible)?
                            
                                At what moment is memory typically allocated for local variables in C++?
                            
                                Why can't C++11 strongly-typed enum be cast to underlying type via pointer?
                            
                                What errors are saved by brevity? [closed]
                            
                                Difference between 'strcpy' and 'strcpy_s'?
                            
                                Why is CMake designed so that it removes runtime path when installing
                            
                                Calling a Method in Constructor
                            
                                Do I have the guarantee that sizeof(type) == sizeof(unsigned type)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With