Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Range of integers that can be expressed precisely as floats / doubles [duplicate]

What is the exact range of (contiguous) integers that can be expressed as a double (resp. float?) The reason I ask is because I am curious for questions such as this one when a loss of accuracy will occur.

That is

  1. What is the least positive integer m such that m+1 cannot be precisely expressed as a double (resp. float)?
  2. What is the greatest negative integer -n such that -n-1 cannot be precisely expressed as a double (resp. float)? (May be the same as the above).

This means that every integer between -n and m has an exact floating-point representation. I'm basically looking for the range [-n, m] for both floats and doubles.

Let's limit the scope to the standard IEEE 754 32-bit and 64-bit floating point representations. I know that the float has 24 bits of precision and the double has 53 bits (both with a hidden leading bit), but due to the intricacies of the floating point representation I'm looking for an authoritative answer for this. Please don't wave your hands!

(Ideal answer would prove that all the integers from 0 to m are expressible, and that m+1 is not.)

like image 330
Andrew Mao Avatar asked Mar 26 '13 17:03

Andrew Mao


People also ask

What is the range of double-precision floating-point?

A double precision, floating-point number is a 64-bit approximation of a real number. The number can be zero or can range from -1.797693134862315E+308 to -2.225073858507201E-308, or from 2.225073858507201E-308 to 1.797693134862315E+308.

What is a double precision float data type?

Double-precision floating-point format (sometimes called FP64 or float64) is a computer number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.

Are doubles or floats more precise?

double has 2x more precision than float. float is a 32-bit IEEE 754 single precision Floating Point Number – 1 bit for the sign, 8 bits for the exponent, and 23* for the value. float has 7 decimal digits of precision.

What is a double precision integer?

In computing, double precision is a computer numbering format that occupies two adjacent storage locations in computer memory. A double precision number, sometimes simply called a double, may be defined to be an integer, fixed point, or floating point (in which case it is often referred to as FP64).


1 Answers

Since you're asking about IEEE floating-point types, the language does not matter.

#include <iostream>
using namespace std;

int main(){

    float f0 = 16777215.; // 2^24 - 1
    float f1 = 16777216.; // 2^24
    float f2 = 16777217.; // 2^24 + 1

    cout << (f0 == f1) << endl;
    cout << (f1 == f2) << endl;

    double d0 = 9007199254740991.; // 2^53 - 1
    double d1 = 9007199254740992.; // 2^53
    double d2 = 9007199254740993.; // 2^53 + 1

    cout << (d0 == d1) << endl;
    cout << (d1 == d2) << endl;
}

Output:

0
1
0
1

So the limit for float is 2^24. And the limit for double is 2^53. Negatives are the same since the only difference is the sign bit.

like image 66
Kyurem Avatar answered Sep 19 '22 11:09

Kyurem