I know a little bit about how floating-point numbers are represented, but not enough, I'm afraid.
The general question is:
For a given precision (for my purposes, the number of accurate decimal places in base 10), what range of numbers can be represented for 16-, 32- and 64-bit IEEE-754 systems?
Specifically, I'm only interested in the range of 16-bit and 32-bit numbers accurate to +/-0.5 (the ones place) or +/- 0.0005 (the thousandths place).
So there are 2^32 - 2^25 = 4261412864 distinct normal numbers in the IEEE 754 binary32 format.
For single-precision floating-point, exponents in the range of -126 to + 127 are biased by adding 127 to get a value in the range 1 to 254 (0 and 255 have special meanings).
32-bit single precision, with an approximate range of 10 -101 to 10 90 and precision of 7 decimal digits. 64-bit double precision, with an approximate range of 10 -398 to 10 369 and precision of 16 decimal digits.
For a given IEEE-754 floating point number X, if
2^E <= abs(X) < 2^(E+1)
then the distance from X to the next largest representable floating point number (epsilon) is:
epsilon = 2^(E-52) % For a 64-bit float (double precision) epsilon = 2^(E-23) % For a 32-bit float (single precision) epsilon = 2^(E-10) % For a 16-bit float (half precision)
The above equations allow us to compute the following:
For half precision...
If you want an accuracy of +/-0.5 (or 2^-1), the maximum size that the number can be is 2^10. Any larger than this and the distance between floating point numbers is greater than 0.5.
If you want an accuracy of +/-0.0005 (about 2^-11), the maximum size that the number can be is 1. Any larger than this and the distance between floating point numbers is greater than 0.0005.
For single precision...
If you want an accuracy of +/-0.5 (or 2^-1), the maximum size that the number can be is 2^23. Any larger than this and the distance between floating point numbers is greater than 0.5.
If you want an accuracy of +/-0.0005 (about 2^-11), the maximum size that the number can be is 2^13. Any larger than this and the distance between floating point numbers is greater than 0.0005.
For double precision...
If you want an accuracy of +/-0.5 (or 2^-1), the maximum size that the number can be is 2^52. Any larger than this and the distance between floating point numbers is greater than 0.5.
If you want an accuracy of +/-0.0005 (about 2^-11), the maximum size that the number can be is 2^42. Any larger than this and the distance between floating point numbers is greater than 0.0005.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With