Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why IEEE754 single-precision float has only 7 digit precision?

Why does a single-precision floating point number have 7 digit precision (or double 15-16 digits precision)?

Can anyone please explain how we arrive on that based on the 32 bits assigned for float(Sign(32) Exponent(30-23), Fraction (22-0))?

like image 670
avulosunda Avatar asked Dec 15 '22 05:12

avulosunda


2 Answers

23 fraction bits (22-0) of the significand appear in the memory format but the total precision is actually 24 bits since we assume there is a leading 1. This is equivalent to log10(2^24) ≈ 7.225 decimal digits.

Double-precision float has 52 bits in fraction, plus the leading 1 is 53. Therefore a double can hold log10(2^53) ≈ 15.955 decimal digits, not quite 16.

Note: The leading 1 is not a sign bit. It is actually (-1)^sign * 1.ffffffff * 2^(eeee-constant) but we need not store the leading 1 in the fraction. The sign bit must still be stored


There are some numbers that cannot be represented as a sum of powers of 2, such as 1/9:

>>>> double d = 0.111111111111111;
>>>> System.out.println(d + "\n" + d*10);
0.111111111111111
1.1111111111111098

If a financial program were to do this calculation over and over without self-correcting, there would eventually be discrepancies.

>>>> double d = 0.111111111111111;
>>>> double sum = 0;
>>>> for(int i=0; i<1000000000; i++) {sum+=d;}
>>>> System.out.println(sum);
111111108.91914201

After 1 billion summations, we are missing over $2.

like image 187
Ron Avatar answered Jan 05 '23 16:01

Ron


32 float has 23 bit,so the smallest unit is

2^(-23) = 0.00000011920928955078125

The other numbers are only greater than 0.00000011920928955078125.It's not impossible less than 0.00000011920928955078125.And other numbers is consist of 0.00000011920928955078125

0.00000011920928955078125 * n

So we can express 0.00000x[1-9] easily.And float32 can has 6 digit precision certainly.Don't think about roundoff, we can calculate 7 digit number as bellow:

0.00000011920928955078125 * 1 = 0.0000001
0.00000011920928955078125 * 2 = 0.0000002
0.00000011920928955078125 * 3 = 0.0000003
0.00000011920928955078125 * 4 = 0.0000004
0.00000011920928955078125 * 5 = 0.0000005
0.00000011920928955078125 * 6 = 0.0000007
0.00000011920928955078125 * 7 = 0.0000008
0.00000011920928955078125 * 8 = 0.0000009
0.00000011920928955078125 * 9 = 0.000001

It can't express 0.0000006.This is the result float32 has 6~7 digit precision which we can find in the internet everywhere.

like image 34
上山老人 Avatar answered Jan 05 '23 17:01

上山老人