I've read a lot about floats, but it's all unnecessarily involved. I think I've got it pretty much understood, but there's just one thing I'd like to know for sure: I know that, fractions of the form <code>1/pow(2,n)</code>, with <code>n</code> an integer, can be represented exactly in floating point numbers. This means that if I add <code>1/32</code> to itself 32 million times, I would get exactly <code>1,000,000</code>. What about something like <code>1/(32+16)</code>? It's one over the sum of two powers of two, does this work? Or is it <code>1/32+1/16</code> that works? This is where I'm confused, so if anyone could clarify that for me I would appreciate it.

The rule can be summed up as this: <ul> <li>A number can be represented exactly in binary if the prime factorization of the denominator contains only 2. (i.e. the denominator is a power-of-two)</li> </ul> So <code>1/(32 + 16)</code> is not representable in binary because it has a factor of 3 in the denominator. But <code>1/32 + 1/16 = 3/32</code> is. That said, there are more restrictions to be representable in a floating-point type. For example, you only have 53 bits of mantissa in an IEEE <code>double</code> so <code>1/2 + 1/2^500</code> is not representable. So you can do sum of powers-of-two as long as the range of the exponents doesn't span more than 53 powers. <hr> To generalize this to other bases: <ul> <li>A number can be exactly represented in base 10 if the prime factorization of the denominator consists of only 2's and 5's.</li> <li>A rational number <code>X</code> can be exactly represented in base <code>N</code> if the prime factorization of the denominator of <code>X</code> contains only primes found in the factorization of <code>N</code>.</li> </ul>

What types of numbers are representable in binary floating-point?

Tags:

I've read a lot about floats, but it's all unnecessarily involved. I think I've got it pretty much understood, but there's just one thing I'd like to know for sure:

I know that, fractions of the form 1/pow(2,n), with n an integer, can be represented exactly in floating point numbers. This means that if I add 1/32 to itself 32 million times, I would get exactly 1,000,000.

What about something like 1/(32+16)? It's one over the sum of two powers of two, does this work? Or is it 1/32+1/16 that works? This is where I'm confused, so if anyone could clarify that for me I would appreciate it.

895

asked Aug 25 '12 19:08

Niet the Dark Absol

2 Answers

The rule can be summed up as this:

A number can be represented exactly in binary if the prime factorization of the denominator contains only 2. (i.e. the denominator is a power-of-two)

So 1/(32 + 16) is not representable in binary because it has a factor of 3 in the denominator. But 1/32 + 1/16 = 3/32 is.

That said, there are more restrictions to be representable in a floating-point type. For example, you only have 53 bits of mantissa in an IEEE double so 1/2 + 1/2^500 is not representable.

So you can do sum of powers-of-two as long as the range of the exponents doesn't span more than 53 powers.

To generalize this to other bases:

A number can be exactly represented in base 10 if the prime factorization of the denominator consists of only 2's and 5's.
A rational number X can be exactly represented in base N if the prime factorization of the denominator of X contains only primes found in the factorization of N.

185

answered Oct 05 '22 19:10

Mysticial

A finite number can be represented in the common IEEE 754 double-precision format if and only if it equals M•2^e for some integers M and e such that -2⁵³ < M < 2⁵³ and -1074 ≤ e ≤ 971.

For single precision, -2²⁴ < M < 2²⁴ and -149 ≤ e ≤ 104.

For double-precision, these are consequences of the facts that the double-precision format uses 52 bits to store a significand (which normally has 53 bits due to an implicit 1) and uses 11 bits to store an exponent. 11 bits encodes numbers from 0 to 2047, but 0 and 2047 are excluded for special purposes, and the encoded number is biased by 1023, so it represents unbiased exponents from -1022 to 1023. However, these unbiased exponents are for significands in the interval [1, 2), and those significands have fractions. To express the significand as an integer, I adjusted the exponent range by 52. Single-precision is similar, with 23 bits to store a 24-bit significand, 8 bits for the exponent, and a bias of 127.

Expressing the representable numbers using an integer times a power of two rather than the more common fractional significand simplifies some number theory and other reasoning about floating-point properties. I used it in this answer because it allows the set of representable values to be expressed concisely.

answered Oct 05 '22 17:10

Eric Postpischil

Related questions
                            
                                How can I make bash deal with long param using "getopt" command in mac?
                            
                                Get to get all child scopes in Angularjs given the parent scope
                            
                                How to use MSBuild extension's Zip task?
                            
                                Json does not exist in the namespace System
                            
                                how make float to integer in awk
                            
                                .NET: Are Dictionary values stored by reference or value
                            
                                Can an NFC phone act as an RFID Tag, which can be read at distances higher than 4 inches?
                            
                                IE10 on Windows 7 Side-by-Side IE8
                            
                                LINQ Single() Exception for 0 or multiple items
                            
                                Where is the cast here? LINQ to Entities only supports casting Entity Data Model primitive types
                            
                                How to prevent SQL Server LocalDB auto shutdown?
                            
                                Converting geo coordinates from degree to decimal

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With