Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

so exactly how many digits can float8, float16, float32, float64, and float128 contain?

Forgive me for asking such a dumb question, but I couldn't find any answers online.

Numpy's dtype documentation only shows X bits exponents and Y bits mantissa for each float types, but I couldn't translate what X bits exponents and Y bits mantissa to exactly how many digits before/after the decimal points. Is there any simple formula/ table to look up to?

Thank you in advance

like image 850
mathguy Avatar asked Jun 09 '19 13:06

mathguy


People also ask

How many digits is a Float16?

Float16 stores 4 decimal digits and the max is about 32,000.

How many digits is a Python float64?

Python's floating-point numbers are usually 64-bit floating-point numbers, nearly equivalent to np.float64 .

How many digits can float64 hold?

The float data type has only 6-7 decimal digits of precision. That means the total number of digits, not the number to the right of the decimal point.

What is float32 and float64?

float32 is a 32 bit number - float64 uses 64 bits. That means that float64's take up twice as much memory - and doing operations on them may be a lot slower in some machine architectures. However, float64's can represent numbers much more accurately than 32 bit floats. They also allow much larger numbers to be stored.


2 Answers

This is not as simple as usually expected. For accuracy of mantissa, there generally are two values:

  1. Given a value in decimal representation, how many decimal digits can be guaranteedly preserved if converted from decimal to a selected binary format and back (with default rounding).

  2. Given a value in binary format, how many decimal digits are needed if value is converted to decimal format and back to original binary format (again, with default rounding) to get the original value unchanged.

In both cases, decimal representation is treated as independent of used exponent, without leading and trailing zeros (for example, all of 0.0123e4, 1.23e2, 1.2300e2, 123, 123.0, 123000.000e-3 are 3 digits).

For 32-bit binary float, these two sizes are 6 and 9 decimal digits, respectively. In C <float.h>, these are FLT_DIG and FLT_DECIMAL_DIG. (This is weird that 32-bit float keeps 7 decimal digits for total most of all numbers, but there are exceptions.) In C++, look at std::numeric_limits<float>::digits10 and std::numeric_limits<float>::max_digits10, respectively.

For 64-bit binary float, these are 15 and 17 (DBL_DIG and DBL_DECIMAL_DIG, respectively; and std::numeric_limits<double>::{digits10, max_digits10}).

General formulas for them (thx2 @MarkDickinson)

  • ${format}_DIG (digits10): floor((p-1)*log10(2))
  • ${format}_DECIMAL_DIG (max_digits10): ceil(1+p*log10(2))

where p is number of digits in mantissa (including hidden one for normalized IEEE754 case).

Also, comments with some mathematical explanation at C++ numeric limits page:

The standard 32-bit IEEE 754 floating-point type has a 24 bit fractional part (23 bits written, one implied), which may suggest that it can represent 7 digit decimals (24 * std::log10(2) is 7.22), but relative rounding errors are non-uniform and some floating-point values with 7 decimal digits do not survive conversion to 32-bit float and back: the smallest positive example is 8.589973e9, which becomes 8.589974e9 after the roundtrip. These rounding errors cannot exceed one bit in the representation, and digits10 is calculated as (24-1)*std::log10(2), which is 6.92. Rounding down results in the value 6.

Look for values for 16- and 128-bit floats in comments (but see below for what is 128-bit float in real).

For exponent, this is simpler because each of the border values (minimum normalized, minimum denormalized, maximum represented) are exact and can be easily obtained and printed.

@PaulPanzer suggested numpy.finfo. It gives first of these values ({format}_DIG); maybe it is the thing you search:

>>> numpy.finfo(numpy.float16).precision
3
>>> numpy.finfo(numpy.float32).precision
6
>>> numpy.finfo(numpy.float64).precision
15
>>> numpy.finfo(numpy.float128).precision
18

but, on most systems (my one was Ubuntu 18.04 on x86-84) the value is confusing for float128; it is really for 80-bit x86 "extended" float with 64 bits significand; real IEEE754 float128 has 112 significand bits and so real value shall be around 33, but numpy presents another type under this name. See here for details: in general, float128 is a delusion in numpy.

UPD3: you mentioned float8 - there is no such type in IEEE754 set. One could imagine such type for some utterly specific purposes, but its range will bee too narrow for any universal usage.

like image 56
Netch Avatar answered Sep 19 '22 05:09

Netch


To keep it simple, generally

Data-Type | Precision
----------------------
float16   | 3
float32   | 7
float64   | 15
float128  | 18
like image 39
SREERAG R NANDAN Avatar answered Sep 18 '22 05:09

SREERAG R NANDAN