float to double assignment

Question

Consider the following code snippet

float num = 281.583f;
int amount = (int) Math.round(num*100f);
float rounded = amount/100.0f;
double dblPrecision = rounded;
double dblPrecision2 = num;
System.out.println("num : " + num + " amount: " + amount + " rounded: " + rounded + " dbl: " + dblPrecision + " dbl2: " + dblPrecision2);

The output I get is

num : 281.583 amount: 28158 rounded: 281.58 dbl: 281.5799865722656 dbl2: 281.5830078125

Why is there the approximation when a float number is assigned to a double variable?

Andrey · Accepted Answer

Approximation actually takes place when you convert decimal fraction to float. I might surprise you, but 281.583 can't be represented exactly as floating point number in PC. it happens because floating point numbers are represented as sum of binary fractions in PC. 0.5, 0.25 and 0.125 can be converted precisely, but not 0.583.

Floats (and doubles) are represented as Σ( 1/2^i*Bi ), where Bi is i-th bit (0|1). 0.625 = 1/2 + 1/4 for example. The problem is that not all decimal fraction can be converted to finitie sum of binary fractions.

Here is how this number is converted (first line is columns definition).

i|  *2 and trim|    Bit value|  (2^-1)*bit
    0,583       
1   1,166   1   0,5
2   0,332   0   0
3   0,664   0   0
4   1,328   1   0,0625
5   0,656   0   0
6   1,312   1   0,015625
7   0,624   0   0
8   1,248   1   0,00390625
9   0,496   0   0
10  0,992   0   0
11  1,984   1   0,000488281
12  1,968   1   0,000244141
13  1,936   1   0,00012207
14  1,872   1   6,10352E-05
15  1,744   1   3,05176E-05
16  1,488   1   1,52588E-05
17  0,976   0   0
18  1,952   1   3,8147E-06
19  1,904   1   1,90735E-06
        SUM=    0,582998276

Michael Borgwardt · Answer

Because floats are binary fractions and thus can only represent your decimal number approximately. The approximation happens when the literal 281.583f in the source code is parsed into an IEEE 754 float value.

With the floats themselves, this is glossed over because println prints

as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values of type float.

In many cases, that means the decimal value of the literal will be printed. However, when you assign the value to a double, the "adjacent values of type double" are usually much, much closer than those of type float, so you get to see the true value of you approximated float.

For more details, read The Floating-Point Guide.

float to double assignment

Tags:

java

floating-accuracy

double-precision

approximation

Prabhu R

2 Answers

Andrey

Michael Borgwardt

Recent Activity

Donate For Us

float to double assignment

Tags:

java

floating-accuracy

double-precision

approximation

Prabhu R

2 Answers

Andrey

Michael Borgwardt

Related questions

Recent Activity

Donate For Us