I realize that whenever one is dealing with IEEE 754 doubles and floats, some numbers can't be represented especially when one tries to represent numbers with lots of digits after the decimal point. This is well understood but I was curious if there were any whole numbers within the MIN/MAX range of a double (or float) that couldn't be represented and thus needed to be rounded to the nearest representable IEEE 754 representation?
For instance very large numbers are sometimes represented in doubles or floats even if they are whole numbers. Clearly using a straight up int64 or some such large integer datatype would be better but people still use doubles for large numbers every so often.
Are there any numbers that can be called out as non-representable or can you give me a mathematical reason why it wouldn't be a problem?
An int is an integer, which you might remember from math is a whole number. A double is a number with a decimal. The number 1 is an integer while the number 1.0 is a double.
The value of this constant is positive 1.7976931348623157E+308. The result of an operation that exceeds Double.
This is possible to do because a float value can hold only a maximum of 7 digits after the decimal, while a double value in Java can hold a maximum of 16 digits after the decimal.
A double precision, floating-point number is a 64-bit approximation of a real number. The number can be zero or can range from -1.797693134862315E+308 to -2.225073858507201E-308, or from 2.225073858507201E-308 to 1.797693134862315E+308.
Sure, there are whole numbers that are not representable as double-precision floating points.
All whole numbers not exceeding Pow(2, 53)
or 9007199254740992
, are representable. From Pow(2, 53)
to Pow(2, 54)
(that's 18014398509481984
), only even numbers are representable. The odd numbers will be rounded.
Of course it continues like that. From Pow(2, 54)
to Pow(2, 55)
only the multiples of 4 (those whole numbers which 4 divides) are representable, from Pow(2, 55)
to Pow(2, 56)
only multiples of 8, and so on.
This is because the double-precision floating-point format has 53 bits (binary digits) for the mantissa (significand).
It is easy to verify my claims. For example, take the number 10000000000000001
as an integer64
. Convert it to double
and then back to integer64
. You will see the precision loss.
When you take very large double-precision numbers, certainly a very little percentage of the whole numbers is representable. For example near 1E+300
(which is between Pow(2, 996)
and Pow(2, 997)
) we are talking multiples of Pow(2, 944)
(1.4870169084777831E+284
). This is consistent with the fact that a double
is precise up to approximately 16 decimal figures. So a whole number with 300 figures will be "remembered" only by its first approx. 16 figures (actually 53 binary digits).
Addition: The first power of ten that is not exactly representable is 1E+23
(or 100 sextillions, short scale naming style). Near that number, only integral multiples of 16777216
(that is Pow(2, 24)
) are representable, but ten to the 23rd power is clearly not a multiple of two to the 24th power. The prime factorization is 10**23 == 2**23 * 5**23
, so we can divide evenly by two only 23 times, not 24 times as required.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With