Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

exact representation of floating points in c

void main()
{
    float a = 0.7;

    if (a < 0.7)
        printf("c");
    else
        printf("c++");
} 

In the above question for 0.7, "c" will be printed, but for 0.8, "c++" wil be printed. Why?

And how is any float represented in binary form?

At some places, it is mentioned that internally 0.7 will be stored as 0.699997, but 0.8 as 0.8000011. Why so?

like image 631
userv Avatar asked Nov 29 '10 17:11

userv


People also ask

How are floating-point numbers represented in C?

Floating-point constants are positive unless they're preceded by a minus sign ( - ). In this case, the minus sign is treated as a unary arithmetic negation operator. Floating-point constants have type float , double , or long double . The Microsoft C compiler internally represents long double the same as type double .

How do you represent in floating-point representation?

In computers, floating-point numbers are represented in scientific notation of fraction ( F ) and exponent ( E ) with a radix of 2, in the form of F×2^E . Both E and F can be positive as well as negative. Modern computers adopt IEEE 754 standard for representing floating-point numbers.

What is floating-point types in C?

Floating-Point Types Float in C is used to store decimal and exponential values. It is used to store decimal numbers (numbers with floating point values) with single precision. Range: 1.2E-38 to 3.4E+38. Size: 4 bytes. Format Specifier: %f.

What is float * in C?

Float is a shortened term for "floating point." By definition, it's a fundamental data type built into the compiler that's used to define numeric values with floating decimal points. C, C++, C# and many other programming languages recognize float as a data type. Other common data types include int and double.


1 Answers

basically with float you get 32 bits that encode

VALUE   = SIGN * MANTISSA * 2 ^ (128 - EXPONENT)
32-bits = 1-bit  23-bits               8-bits

and that is stored as

MSB                    LSB
[SIGN][EXPONENT][MANTISSA]

since you only get 23 bits, that's the amount of "precision" you can store. If you are trying to represent a fraction that is irrational (or repeating) in base 2, the sequence of bits will be "rounded off" at the 23rd bit.

0.7 base 10 is 7 / 10 which in binary is 0b111 / 0b1010 you get:

0.1011001100110011001100110011001100110011001100110011... etc

Since this repeats, in fixed precision there is no way to exactly represent it. The same goes for 0.8 which in binary is:

0.1100110011001100110011001100110011001100110011001101... etc

To see what the fixed precision value of these numbers is you have to "cut them off" at the number of bits you and do the math. The only trick is you the leading 1 is implied and not stored so you technically get an extra bit of precision. Because of rounding, the last bit will be a 1 or a 0 depending on the value of the truncated bit.

So the value of 0.7 is effectively 11,744,051 / 2^24 (no rounding effect) = 0.699999988 and the value of 0.8 is effectively 13,421,773 / 2^24 (rounded up) = 0.800000012.

That's all there is to it :)

like image 61
vicatcu Avatar answered Nov 15 '22 13:11

vicatcu