Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do we need both a round bit and a sticky bit in IEEE 754 floating point implementations?

In my university lecture we just learnt about IEEE 754 arithmetic using the following table:

Guard Round Sticky Result
0 x x Round down (do nothing to significand)
1 1 x Round up
1 0 1 Round up
1 0 0 Round significand to even low digit

As one can see in the table above the round bit and sticky bit could just be unified into one (that being if one of the two is true; the unified one is true), which would yield the same results.

So my question thus is: Why do we need both?

like image 839
aeternum Avatar asked Oct 16 '25 19:10

aeternum


2 Answers

Three bits are needed because a normalization in subtraction can cause a shift left, leaving only two bits to indicate rounding. Note that guard, round, and sticky bits are a feature of implementations of addition and subtraction; they are not specified in IEEE 754.

Consider this subtraction in a format with four-bit significands:

 1.000×25
−1.001×21

Our addition/subtraction hardware has three extra bits, to be the guard, round, and sticky bits:

       GRS
 1.000 000×25
−1.001 000×21

We start by shifting the second operand right to have the same exponent as the first. Bits shift through the guard and round positions normally, but, once any 1 bit shifts into the sticky position, that position stays 1 for any further right shifts. So we have:

       GRS
 1.000 000×25
−0.000 101×25

Then we subtract:

       GRS
 1.000 000×25
−0.000 101×25

0.111 011×25

This result is not normalized (it does not start with 1), so we need to shift it left, giving:

 1.110 11 ×24

That shift left is what the guard bit was guarding against. Now the remaining two bits tell us to round up. If we did not have the third bit, we would have only the single 1 bit after the significand, which would represent exactly ½ the LSB and be insufficient to distinguish between the less than ½, exactly ½, and greater than ½ cases.

Note a subtraction can require more than one bit of left shift, as in:

       GRS
 1.000 000×25
−0.111 100×25

0.000 100×25

However, this occurs only if the two operands differed by at most one in the exponent, in which case there will have been at most one shift into the guard, round, and sticky bits, so all further bits are known to be zero, so we do not need additional hardware to record them.

(I adapted this example from this course handout by David A. Wood and Ramkumar Ravikumar.)

like image 163
Eric Postpischil Avatar answered Oct 19 '25 11:10

Eric Postpischil


Why do we need both?

The 3: (G)uard*1, (R)ound*1, (S)ticky, are useful with select cases when we are using round to nearest, ties to even


Subtracting 2 values near a power-of-2

a is a power-of-2 and b is somewhat less.

Consider the subtraction of the fractions a - b (say 11 digits instead of common 53 for brevity):

Example:
a and b have the same sign bit,
a's exponent > b's exponent.
a11 is 1, [a10 ... a00] are all 0.
b11 is 1.

  a11 a10 a09 a08 a07 a06 a05 a04 a03 a02 a01 a00
-                         b11 b10 b09 b08 b07 b06 b05 b04 b03 b02 b01 b00
                                                          ^---^---^---^-- b's Sticky
                                                      ^------------------ b's Round
                                                  ^---------------------- b's Guard

// b's sticky bit is true is any of bits [b03...b00] are more than zero.

// The guard digit needed to form an 11-bit answer.
  1   0   0   0   0   0   0   0   0   0   0   0   0     0     0
-                         1   b10 b09 b08 b07 b06 b's G b's R B's S
  ----------------------------------------------------------------- 
  0   1   1   1   1   1   0   d05 d04 d03 d02 d01 d00   d's R d's S

Then the round digit and sticky bit of the difference d perform the usual function with "round-to nearest, ties to even". Here it is important to keep the round digit and sticky bits sperate.

  Table: Round to nearest, ties to even.
  last bit (d00)  Round>=base/2    Sticky
     X                0            X     Round down to 0   (no add)
     0                1            0     Round down to 0   (no add) !!!
     1                1            0     Round away from 0 (add 1) !!!
     X                1            1     Round away from 0 (add 1)

In other cases

These cases are far more common.

b's guard digit is not needed in the subtraction as a-b. b's guard was not shifted up as d00.

Instead the round and sticky are updated. Here the sticky bit is rolled up into the round digit as OP suggests.

sticky <--- sticky or (round > 0)
round  <--- guard >= base/2

The table Round to nearest, ties to even is then applied.


*1 G and R are more like digits than bits. With base 2, a digit and bit are the same, but with base 10, its needs to be another digit.

like image 42
chux - Reinstate Monica Avatar answered Oct 19 '25 10:10

chux - Reinstate Monica