Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can a double represent all values a float can represent?

There are certain int values that a float can not represent.

However, can a double represent all values a float can represent?

My intuition says yes, since double has more fractional bits & more exponent bits, but there might be some silly gotchas that I'm missing.

like image 423
anon Avatar asked May 08 '10 21:05

anon


People also ask

What can a double represent?

This leads to a precision of 6-7 decimal digits. The range of numbers it can represent is 2 − 126 2^{-126} 2−126 to 2 127 2^{127} 2127. In contrast, a double is a 64-bit double-precision Floating Point Number according to IEEE.

Can double represent all longs?

Short answer is "no" - the range of values an int can represent and that a double can represent are implementation defined - but a double certainly cannot support every integral value in the range it can represent.

How many numbers can a double represent?

double is a 64-bit IEEE 754 double precision Floating Point Number – 1 bit for the sign, 11 bits for the exponent, and 52* bits for the value. double has 15 decimal digits of precision.

How many values can a float represent?

Over the entire typical float range, about 232 different values can be represented.


2 Answers

Yes.

It would probably help to know how floats and doubles work.

Without going too much into details...

Take the number 152853.5047 ( the revolution period of Jupiter's moon Io in seconds )

In scientific notation, this number is 0.1528535047 × 10^6

Since computers only understand 1 and 0, there is way to define .

The mantissa (1528535047) and the exponent (6) are stored within 32-bits... if I remember correctly, only 24-bits are for the mantissa, so floating point is usually more about precision than size. The larger the number, the less precise it can be.

1528535047 = 1011011000110111001100000000111 so you can only store the first 24-bits... the last three 1's are lopped off.

Since Integers are 32-bits, you're right, a floating point can't accurately contain it. less significant digits get lopped off the end.

Any integer with an absolute value of less than 2^24 ( 24-bits )can be stored without losing precision. (16,777,216)

This is how the bits are stored in a floating point number:

How floats are stores diagram http://phimuemue.wordpress.com/files/2009/06/576px-ieee-754-single-svg1.png

source One bit for the sign, 8-bits for the exponent and 23-bits for the mantissa. Therefore, to answer your question, since only 23-bits are reserved for the mantissa, a 32-bit integer can't be showed with precision. It will quickly start lopping off numbers ( from the right ) as there are more digits needed to display.

For a double, you're merely increasing the number of bits that it can store... in fact, it's called double precision so any number that can be shown as a float is capable of being shown as a double. Extra 0's are merely added to the mantissa.

For this reason, since a double takes up 64-bits, most people will use a double when converting from a 32-bit int to a double. A float would be good for converting a 16-bit short.

like image 175
Armstrongest Avatar answered Sep 21 '22 03:09

Armstrongest


6.2.5/10 in n1256:

There are three real floating types, designated as float, double, and long double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double.

(emphasis mine).

Whether the implementation uses IEEE754 or not is irrelevant, the C99 standard guarantees what you want.

like image 27
Steve Jessop Avatar answered Sep 20 '22 03:09

Steve Jessop