Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stealing bits from an IEEE 754 double

What are the effects on floating-point math likely to be if the least-significant bit(s) of the significand is(/are) set to a random value?

Explanation:

The language PicoLisp allocates all values using one structure, the cell, which consists of two "machine words". On a 32-bit system, this means a cell is an eight-byte structure of two 32-bit pointers or integers. Cells are aligned to their size, which means that at least the lowest three bits of a word are free to be used as type and GC tag data.

PicoLisp is very minimalist. One of the (many) things the language lacks is any support whatsoever for floating-point numbers, instead relying entirely on what the documentation calls "scaled fixpoint" representation. I thought it would be fun to try to add floating point support.

On 32-bit systems a 64-bit float will fit neatly inside one cell, conveniently meaning the allocation system can be pretty much the same, except for one minor problem: all 64 bits will be used by the double. But the GC expects to use bit 0 as a GC mark bit. Proceeding naively, bit 0 will be set to zero after every collection cycle regardless of what value was actually stored in the double.

(This is assuming the sizes and endianness all line up correctly. Assume for the purposes of this that they do; if they don't then the entire question is completely irrelevant and a different strategy necessarily has to be used.)

So: how much of a problem is this for general-purpose math, using the hardware float operations?

If all it does is reduce the precision of the double a tiny amount, then I figure that's not actually a problem: as long as it is documented that floating-point math in the interpreter isn't as precise as the user expects and they should fall back to fixpoint or a library or something if they need strictly accurate behaviour. My intuitive understanding of it is that this ought to be the case, since it's the least significant bit (doesn't even show up when you convert to a string..?).

On the other hand, floating point is, uh, witchcraft. Could this sort of bit-fiddling actually severely affect the usefulness of math or the ability to produce any kind of consistent results?

(I have considered several other implementation possibilities for the allocator. I'm specifically interested in whether this strategy is monumentally stupid or not, because it's the easiest and I am lazy.)

like image 419
Leushenko Avatar asked May 08 '13 23:05

Leushenko


2 Answers

As long as outside code always sees the value as if the low bit has been rounded off, and you do this by rounding the mantissa to the nearest even value, for normal calculations that would be okay.

That is, for mantissas that end in:

00: do nothing

10: do nothing

01: subtract 1 from the mantissa

11: add 1 to the mantissa (on overflow, you will need to increment the exponent and clear the mantissa)

If you aren't consistent with your rounding and just lop off the low bit, you will introduce a very slight downward bias to your calculations. Rounding towards even is IEEE's way of counteracting that downward bias.

Be careful with +/- infinity, as setting the low bit will turn those into NANs, which are pretty brittle to work with (suddenly all your comparison ops start failing).

like image 199
StilesCrisis Avatar answered Oct 08 '22 17:10

StilesCrisis


The scheme StilesCrisis is proposing causes double rounding, which is generally considered a bad thing.

One other option I would like to suggest:

Display and compute with each PicoLisp float as if it was 2512 times larger than it it. This means double addition and subtraction remain almost unchanged, multiplication and division require one cheap adjustment, and other operations (library calls) require two adjustments, one before and one after.

After each operation, check for overflow (which now happens more often, every time the biased result is above 1.0).

If you do this, instead of borrowing the least significant bit of the significand, you are actually borrowing the most significant bit of the exponent. This requires some bit-shuffling to load and store floats but this will be much easier to explain to programmers using the system, and algorithms designed for IEEE 754-like properties will continue to work (except when they now overflow).


The code might look like this lightly tested implementation. A similar implementation in another context is the object of this blog post, that provides more explanation.

void smalldouble_to_cell(void*p, double d)
{
  union u u;
  u.d = d;
  unsigned long long rest = u.u & 0x7fffffffffffffff;
  unsigned long long packed;
  if (rest > 0x7ff0000000000000)
    /* NaN */
    packed = u.u & 0xfffffffffffffffe;
  else 
    {
      unsigned long long sign = u.u & 0x8000000000000000;
      if (rest >= 0x3ff0000000000000)
    rest = 0x3ff0000000000000;
      packed = sign | (rest << 1);
    }
  memcpy(p, &packed, 8);
}

void double_to_cell(void *p, double d)
{
  smalldouble_to_cell(p, ldexp(d, -512));
}
like image 43
Pascal Cuoq Avatar answered Oct 08 '22 15:10

Pascal Cuoq