Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

float to double assignment

Consider the following code snippet

float num = 281.583f;
int amount = (int) Math.round(num*100f);
float rounded = amount/100.0f;
double dblPrecision = rounded;
double dblPrecision2 = num;
System.out.println("num : " + num + " amount: " + amount + " rounded: " + rounded + " dbl: " + dblPrecision + " dbl2: " + dblPrecision2);

The output I get is

num : 281.583 amount: 28158 rounded: 281.58 dbl: 281.5799865722656 dbl2: 281.5830078125

Why is there the approximation when a float number is assigned to a double variable?

like image 603
Prabhu R Avatar asked Nov 11 '10 12:11

Prabhu R


2 Answers

Approximation actually takes place when you convert decimal fraction to float. I might surprise you, but 281.583 can't be represented exactly as floating point number in PC. it happens because floating point numbers are represented as sum of binary fractions in PC. 0.5, 0.25 and 0.125 can be converted precisely, but not 0.583.

Floats (and doubles) are represented as Σ( 1/2^i*Bi ), where Bi is i-th bit (0|1). 0.625 = 1/2 + 1/4 for example. The problem is that not all decimal fraction can be converted to finitie sum of binary fractions.

Here is how this number is converted (first line is columns definition).

i|  *2 and trim|    Bit value|  (2^-1)*bit
    0,583       
1   1,166   1   0,5
2   0,332   0   0
3   0,664   0   0
4   1,328   1   0,0625
5   0,656   0   0
6   1,312   1   0,015625
7   0,624   0   0
8   1,248   1   0,00390625
9   0,496   0   0
10  0,992   0   0
11  1,984   1   0,000488281
12  1,968   1   0,000244141
13  1,936   1   0,00012207
14  1,872   1   6,10352E-05
15  1,744   1   3,05176E-05
16  1,488   1   1,52588E-05
17  0,976   0   0
18  1,952   1   3,8147E-06
19  1,904   1   1,90735E-06
        SUM=    0,582998276
like image 83
Andrey Avatar answered Sep 27 '22 23:09

Andrey


Because floats are binary fractions and thus can only represent your decimal number approximately. The approximation happens when the literal 281.583f in the source code is parsed into an IEEE 754 float value.

With the floats themselves, this is glossed over because println prints

as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values of type float.

In many cases, that means the decimal value of the literal will be printed. However, when you assign the value to a double, the "adjacent values of type double" are usually much, much closer than those of type float, so you get to see the true value of you approximated float.

For more details, read The Floating-Point Guide.

like image 44
Michael Borgwardt Avatar answered Sep 27 '22 22:09

Michael Borgwardt