Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do higher-precision floating point formats have so many exponent bits?

I've been looking at floating point formats, both IEEE 754 and x87. Here's a summary:

                Total       Bits per field
Precision       Bits    Sign  Exponent  Mantissa
Single          32      1     8         23  (+1 implicit)   
Double          64      1     11        52  (+1 implicit)
Extended (x87)  80      1     15        64
Quadruple       128     1     15        112 (+1 implicit) 

My question is, why do the higher-precision formats have so many exponent bits? Single-precision gets you a maximum value on the order of 10^38, and I can see how in extreme cases (number of atoms in the universe) you might need a larger exponent. But double-precision goes up to ~10^308, and extended- and quadruple-precision have even more exponent bits. This seems much larger than could ever be necessary for actual hardware-accelerated computation. (It's even more absurd with negative exponents!)

That being said, the mantissa bits are so obviously valuable that I figure there must be a good reason to sacrifice them in favor of the exponent. So what is it? I thought it might be to represent the difference between two adjacent values without needing subnormals, but even that doesn't take a big change in the exponent (-6 out of a full range of +1023 to -1022 for a double).

like image 768
Adam Haun Avatar asked Nov 23 '16 23:11

Adam Haun


1 Answers

The IEEE-754 floating-point standard grew out of work professor William Kahan of UC Berkeley had done as a consultant to Intel when Intel embarked on the creation of the 8087 math coprocessor. One of the design criteria for what became the IEEE-754 floating-point formats was functional compatibility with existing proprietary floating-point formats to the largest extent possible. The book

John F. Palmer and Stephen P. Morse, "The 8087 Primer". Wiley, New York 1984.

specifically mentions the 60-bit floating-point format of the CDC 6600, with an 11-bit exponent and 48-bit mantissa, with respect to the double-precision format.

The following published interview (which inexplicably mangles Jerome Coonen's name into Gerome Kunan) provides a brief overview of the genesis of IEEE-754, including a discussion of the choice of floating-point formats:

Charles Severance, "IEEE 754: An Interview with William Kahan", IEEE Computer, Vol. 31, No. 3, March 1998, pp. 114-115 (online)

In the interview, William Kahan mentions adoption of the floating-point formats of the extremely popular DEC VAX minicomputers, in particular the F format for single precision with 8 exponent bits, and the G format for double precision with 11 exponent bits.

The VAX F format goes back to DEC's earlier PDP-11 architecture, and the rationale for choosing 8 exponent bits is stated in PDP-11/40 Technical Memorandum #16: a desire to be able to represent all important physical constants, including the Planck constant (6.626070040 x 10-34) and the Avogadro constant (6.022140857 x 1023).

The VAX had originally used the D format for double precision, which used the same number of exponent bits, namely 8, as the F format. This was found to cause trouble through underflow in intermediate computations, for example in the LAPACK linear algebra routines, as noted in a contribution by James Demmel in NA Digest Sunday, February 16, 1992 Volume 92 : Issue 7. This issue is also alluded to in the interview with Kahan, in which it is mentioned that the subsequently introduced VAX G format was inspired by the CDC 6600 floating-point format.

David Stephenson, "A Proposed Standard for Binary Floating-Point Arithmetic", IEEE Computer, Vol. 14, No. 3, March 1981), pp. 51-62 (online)

explains the choice of number of exponent bits for IEEE-754 double precision as follows:

For the 64-bit format, the main consideration was range; as a minimum, the desire was that the product of any two 32-bit numbers should not overflow the 64-bit format. The final choice of exponent range provides that a product of eight 32-bit terms cannot overflow the 64-bit format — a possible boon to users of optimizing compilers which reorder the sequence of arithmetic operations from that specified by the careful programmer.

The "extended" floating-point types of IEEE-754 were introduced specifically as intermediate formats that ease implementation of accurate standard mathematical functions for the corresponding "regular" floating-point types.

Jerome T. Coonen, "Contributions to a Proposed Standard for Binary Floating-Point Arithmetic". PhD dissertation, Univ. of California, Berkeley 1984

states that precursors were extended accumulators in the IBM 709x and Univac 1108 machines, but I am not familiar with the formats used for those.

According to Coonen, the choice of the number of mantissa bits in extended formats was driven by the needs of binary-decimal conversion as well as general exponentiation xy. Palmer / Morse mention exponentiation as well and provide details: Due to the error magnification properties of exponentiation, a naive computation utilizing an extended format requires as many additional bits in the mantissa as there are bits in the exponent of the regular format to deliver accurate results. Since double precision uses 11 exponent bits, 64 mantissa bits are therefore required for the double-extended format.

I checked the draft documents published ahead of the release of the IEEE-754 standard in addition to Coonen's PhD thesis and was unable to find a stated rationale for the number of 15 exponent bits in the double-extended format.

From personal design experience with x87 floating-point units I am aware that the straightforward implementation of elementary math functions, without danger of intermediate overflow, motivates at least three additional exponent bit. The use of 15 bits specifically may be an artifact of the hardware design. The 8086 CPU used 16-bit words as a basic building block, so a requirement of 64 mantissa bits in the double-extended format would lead to a format comprising 80 bits (= five words), leaving 15 bits for the exponent.

like image 137
njuffa Avatar answered Jan 03 '23 19:01

njuffa