Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is the first integer that an IEEE 754 float is incapable of representing exactly?

For clarity, if I'm using a language that implements IEE 754 floats and I declare:

float f0 = 0.f;
float f1 = 1.f;

...and then print them back out, I'll get 0.0000 and 1.0000 - exactly.

But IEEE 754 isn't capable of representing all the numbers along the real line. Close to zero, the 'gaps' are small; as you get further away, the gaps get larger.

So, my question is: for an IEEE 754 float, which is the first (closest to zero) integer which cannot be exactly represented? I'm only really concerned with 32-bit floats for now, although I'll be interested to hear the answer for 64-bit if someone gives it!

I thought this would be as simple as calculating 2bits_of_mantissa and adding 1, where bits_of_mantissa is how many bits the standard exposes. I did this for 32-bit floats on my machine (MSVC++, Win64), and it seemed fine, though.

like image 969
Floomi Avatar asked Oct 10 '22 04:10

Floomi


People also ask

Can float represent integers exactly?

Storing Integers It might take some time and/or brain power to soak that up (it did for me!) but what that ends up ultimately meaning is that floating point numbers can exactly represent a large number of integers. Doubles can in fact exactly represent any 32 bit unsigned integer, since 2^32 = 4,294,967,296.

What is the smallest positive integer that is not a single precision number?

So I know that the smallest positive integer not representable by a single precision floating point is 2^(23+1) + 1 = 16,777,217.

What is float and integer?

Integers and floats are two different kinds of numerical data. An integer (more commonly called an int) is a number without a decimal point. A float is a floating-point number, which means it is a number that has a decimal place. Floats are used when more precision is needed.


2 Answers

2mantissa bits + 1 + 1

The +1 in the exponent (mantissa bits + 1) is because, if the mantissa contains abcdef... the number it represents is actually 1.abcdef... × 2^e, providing an extra implicit bit of precision.

Therefore, the first integer that cannot be accurately represented and will be rounded is:

  • For 32-bit floats, 16,777,217 (224 + 1).
  • For 64-bit floats, 9,007,199,254,740,993 (253 + 1).

Here's an example in CPython 3.10, which uses 64-bit floats:

>>> 9007199254740993.0
9007199254740992.0
like image 263
kennytm Avatar answered Oct 13 '22 06:10

kennytm


The largest value representable by an n bit integer is 2n-1. As noted above, a float has 24 bits of precision in the significand which would seem to imply that 224 wouldn't fit.

However.

Powers of 2 within the range of the exponent are exactly representable as 1.0×2n, so 224can fit and consequently the first unrepresentable integer for float is 224+1. As noted above. Again.

like image 47
thus spake a.k. Avatar answered Oct 13 '22 07:10

thus spake a.k.