Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

integer stored as float

Tags:

math

ieee-754

I have some questions regarding integers and floats:

  1. Can I store every 32-bit unsigned integer value into a 64-bit IEEE floating point value (such that when I assign the double value back to an int the int will contain the original value)?

  2. What are the smallest (magnitude wise) positive and negative integer values that cannot be stored in a 32-bit IEEE floating point value (by the same definition as in 1)?

  3. Do the answers to these questions depend on language used?

//edit: I know these questions sound a bit like from some test but I'm asking about these things because I need to make some decisions on a dataformat definition

like image 357
matthias_buehlmann Avatar asked Oct 21 '22 00:10

matthias_buehlmann


1 Answers

  1. Yes, you can store a 32-bit integer a 64-bit double without information loss. The mantissa has 53 bits of precision, which is enough.
  2. A 32-bit float has a 24-bit mantissa, so the maximum and minimum integers with a unique representation are 2^24-1 and -2^24+1 (16777215 and -16777215). Greater numbers don't have a unique representation; for example 16777216 == (float)16777217.
  3. If you assume the language follows IEEE-754, it doesn't depend on the language.
like image 137
Joni Avatar answered Dec 02 '22 05:12

Joni