Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

problems in floating point comparison [duplicate]

void main()
{
    float f = 0.98;
    if(f <= 0.98)
        printf("hi");
    else
        printf("hello");
    getch();
}

I am getting this problem here.On using different floating point values of f i am getting different results. Why this is happening?

like image 253
Adi Avatar asked Oct 18 '10 19:10

Adi


People also ask

Why is it a problem to compare two floating point numbers?

In the case of floating-point numbers, the relational operator (==) does not produce correct output, this is due to the internal precision errors in rounding up floating-point numbers. In the above example, we can see the inaccuracy in comparing two floating-point numbers using “==” operator.

What is a better way to compare floating point values?

To compare two floating point or double values, we have to consider the precision in to the comparison. For example, if two numbers are 3.1428 and 3.1415, then they are same up to the precision 0.01, but after that, like 0.001 they are not same.

Why should we never compare floating point values by using exact equality comparison?

Comparing for equality Simple values like 0.1 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations or the precision of intermediates can change the result.

What are the limitations of floating point representation?

As a result, they do not represent all of the same values, are not binary compatible, and have different associated error rates. Because of a lack of guarantees on the specifics of the underlying floating-point system, no assumptions can be made about either precision or range.


2 Answers

f is using float precision, but 0.98 is in double precision by default, so the statement f <= 0.98 is compared using double precision.

The f is therefore converted to a double in the comparison, but may make the result slightly larger than 0.98.

Use

if(f <= 0.98f)

or use a double for f instead.


In detail... assuming float is IEEE single-precision and double is IEEE double-precision.

These kinds of floating point numbers are stored with base-2 representation. In base-2 this number needs an infinite precision to represent as it is a repeated decimal:

0.98 = 0.1111101011100001010001111010111000010100011110101110000101000...

A float can only store 24 bits of significant figures, i.e.

       0.111110101110000101000111_101...
                                 ^ round off here
   =   0.111110101110000101001000

   =   16441672 / 2^24

   =   0.98000001907...

A double can store 53 bits of signficant figures, so

       0.11111010111000010100011110101110000101000111101011100_00101000...
                                                              ^ round off here
   =   0.11111010111000010100011110101110000101000111101011100

   =   8827055269646172 / 2^53

   =   0.97999999999999998224...

So the 0.98 will become slightly larger in float and smaller in double.

like image 79
kennytm Avatar answered Nov 04 '22 14:11

kennytm


It's because floating point values are not exact representations of the number. All base ten numbers need to be represented on the computer as base 2 numbers. It's in this conversion that precision is lost.

Read more about this at http://en.wikipedia.org/wiki/Floating_point


An example (from encountering this problem in my VB6 days)

To convert the number 1.1 to a single precision floating point number we need to convert it to binary. There are 32 bits that need to be created.

Bit 1 is the sign bit (is it negative [1] or position [0]) Bits 2-9 are for the exponent value Bits 10-32 are for the mantissa (a.k.a. significand, basically the coefficient of scientific notation )

So for 1.1 the single floating point value is stored as follows (this is truncated value, the compiler may round the least significant bit behind the scenes, but all I do is truncate it, which is slightly less accurate but doesn't change the results of this example):

s --exp--- -------mantissa--------
0 01111111 00011001100110011001100

If you notice in the mantissa there is the repeating pattern 0011. 1/10 in binary is like 1/3 in decimal. It goes on forever. So to retrieve the values from the 32-bit single precision floating point value we must first convert the exponent and mantissa to decimal numbers so we can use them.

sign = 0 = a positive number

exponent: 01111111 = 127

mantissa: 00011001100110011001100 = 838860

With the mantissa we need to convert it to a decimal value. The reason is there is an implied integer ahead of the binary number (i.e. 1.00011001100110011001100). The implied number is because the mantissa represents a normalized value to be used in the scientific notation: 1.0001100110011.... * 2^(x-127).

To get the decimal value out of 838860 we simply divide by 2^-23 as there are 23 bits in the mantissa. This gives us 0.099999904632568359375. Add the implied 1 to the mantissa gives us 1.099999904632568359375. The exponent is 127 but the formula calls for 2^(x-127).

So here is the math:

(1 + 099999904632568359375) * 2^(127-127)

1.099999904632568359375 * 1 = 1.099999904632568359375

As you can see 1.1 is not really stored in the single floating point value as 1.1.

like image 28
Matt Avatar answered Nov 04 '22 15:11

Matt