There are certain int values that a float can not represent. However, can a double represent all values a float can represent? My intuition says yes, since double has more fractional bits & more exponent bits, but there might be some silly gotchas that I'm missing.

6.2.5/10 in n1256: <blockquote> There are three real floating types, designated as float, double, and long double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. </blockquote> (emphasis mine). Whether the implementation uses IEEE754 or not is irrelevant, the C99 standard guarantees what you want.

Can a double represent all values a float can represent?

2 Answers

Yes.

It would probably help to know how floats and doubles work.

Without going too much into details...

Take the number 152853.5047 ( the revolution period of Jupiter's moon Io in seconds )

In scientific notation, this number is 0.1528535047 × 10^6

Since computers only understand 1 and 0, there is way to define .

The mantissa (1528535047) and the exponent (6) are stored within 32-bits... if I remember correctly, only 24-bits are for the mantissa, so floating point is usually more about precision than size. The larger the number, the less precise it can be.

1528535047 = 1011011000110111001100000000111 so you can only store the first 24-bits... the last three 1's are lopped off.

Since Integers are 32-bits, you're right, a floating point can't accurately contain it. less significant digits get lopped off the end.

Any integer with an absolute value of less than 2^24 ( 24-bits )can be stored without losing precision. (16,777,216)

This is how the bits are stored in a floating point number:

How floats are stores diagram http://phimuemue.wordpress.com/files/2009/06/576px-ieee-754-single-svg1.png

source One bit for the sign, 8-bits for the exponent and 23-bits for the mantissa. Therefore, to answer your question, since only 23-bits are reserved for the mantissa, a 32-bit integer can't be showed with precision. It will quickly start lopping off numbers ( from the right ) as there are more digits needed to display.

For a double, you're merely increasing the number of bits that it can store... in fact, it's called double precision so any number that can be shown as a float is capable of being shown as a double. Extra 0's are merely added to the mantissa.

For this reason, since a double takes up 64-bits, most people will use a double when converting from a 32-bit int to a double. A float would be good for converting a 16-bit short.

175

answered Sep 21 '22 03:09

Armstrongest

6.2.5/10 in n1256:

There are three real floating types, designated as float, double, and long double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double.

(emphasis mine).

Whether the implementation uses IEEE754 or not is irrelevant, the C99 standard guarantees what you want.

answered Sep 20 '22 03:09

Steve Jessop

Related questions
                            
                                on iOS/iPhone: "Too many open files": need to list open files (like lsof)
                            
                                Why is orig_eax provided in addition to eax?
                            
                                How are static arrays stored in Java memory?
                            
                                Password to key function compatible with OpenSSL commands?
                            
                                Where are the functions in the C standard library defined?
                            
                                strtok function thread safety
                            
                                Pointer to Array of Pointers
                            
                                What is a "byte" in C / C++
                            
                                Why sizeof(param_array) is the size of pointer?
                            
                                size of character array and size of character pointer
                            
                                What's the meaning of multiple const qualifiers?
                            
                                typedef of array of typedefs of array
                            
                                Given a 32 bit number, what is an efficient way to scale each byte by a certain factor?
                            
                                C difference between *[] and **
                            
                                How to get child PID in C?
                            
                                How to get Ctrl, Shift or Alt with getch() ncurses?
                            
                                What is transactional memory?
                            
                                Is ((void *) -1) a valid address?
                            
                                How does srand relate to rand function?
                            
                                C strings confusion

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can a double represent all values a float can represent?

Tags:

c

int

floating-point

double

anon

People also ask

2 Answers

Yes.

Armstrongest

Steve Jessop

Recent Activity

Donate For Us