Consider the following code snippet
float num = 281.583f;
int amount = (int) Math.round(num*100f);
float rounded = amount/100.0f;
double dblPrecision = rounded;
double dblPrecision2 = num;
System.out.println("num : " + num + " amount: " + amount + " rounded: " + rounded + " dbl: " + dblPrecision + " dbl2: " + dblPrecision2);
The output I get is
num : 281.583 amount: 28158 rounded: 281.58 dbl: 281.5799865722656 dbl2: 281.5830078125
Why is there the approximation when a float number is assigned to a double variable?
Approximation actually takes place when you convert decimal fraction to float
. I might surprise you, but 281.583
can't be represented exactly as floating point number in PC. it happens because floating point numbers are represented as sum of binary fractions in PC. 0.5
, 0.25
and 0.125
can be converted precisely, but not 0.583
.
Floats (and doubles) are represented as Σ( 1/2^i*Bi )
, where Bi
is i-th bit (0|1)
. 0.625 = 1/2 + 1/4
for example. The problem is that not all decimal fraction can be converted to finitie sum of binary fractions.
Here is how this number is converted (first line is columns definition).
i| *2 and trim| Bit value| (2^-1)*bit
0,583
1 1,166 1 0,5
2 0,332 0 0
3 0,664 0 0
4 1,328 1 0,0625
5 0,656 0 0
6 1,312 1 0,015625
7 0,624 0 0
8 1,248 1 0,00390625
9 0,496 0 0
10 0,992 0 0
11 1,984 1 0,000488281
12 1,968 1 0,000244141
13 1,936 1 0,00012207
14 1,872 1 6,10352E-05
15 1,744 1 3,05176E-05
16 1,488 1 1,52588E-05
17 0,976 0 0
18 1,952 1 3,8147E-06
19 1,904 1 1,90735E-06
SUM= 0,582998276
Because floats are binary fractions and thus can only represent your decimal number approximately. The approximation happens when the literal 281.583f
in the source code is parsed into an IEEE 754 float value.
With the floats themselves, this is glossed over because println
prints
as many, but only as many, more digits as are needed to uniquely distinguish the argument value from adjacent values of type float.
In many cases, that means the decimal value of the literal will be printed. However, when you assign the value to a double
, the "adjacent values of type double" are usually much, much closer than those of type float
, so you get to see the true value of you approximated float.
For more details, read The Floating-Point Guide.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With