Why not use a two's complement based floating-point?

Question

IEEE 754 standard for float64, 32 and 16 use a signed significand and a biased exponent. As a student designing hardware architectures, it makes more sense to me to use two's complement for the significand and exponent parts.

For example, 32 bit (half precision) float is defined such that the first bit represents sign, next 8 bits - exponent (biased by 127) and last 23 bits represent the mantissa. To implement addition/multiplication (of negative numbers), we need to convert mantissa to two's complement and back. The resulting hardware is quite complicated.

Instead, consider if the first 8 bits represent exponent and last 24 bits represent mantissa, both in two's complement. bit shifting, adding and multiplying are relatively straightforward and the hardware is less complicated. In addition, we have a unique zero for significand (two zeros for signed bit representation)

I searched for months to find reasons for these design decisions and found these:

2's complement representations are more difficult to compare.

This is true, we need an adder (subtracter) to compare 2's complement. However, for pipelined architectures such as GPUs and my own FPGA based CNN accelerator, we need to avoid variable delay. Comparing a signed representation bit by bit iteratively makes it impossible to predetermine the delay. In my opinion, a subtraction is better in this case.

Historic reasons: Handling NANs and infs

Maybe we could allocate one or two bits for this. And make significand 23 bits.

+0 and -0 zero, such that 1/+0 = +inf and 1/-0 = -inf

Now this is a valid reason. It's not really applicable to my use case, but i wonder if it would better if they had implemented this with an additional bit.

My use case

I am building a CNN accelerator on an FPGA. Having predefined delays for multiplication and addition and minimizing hardware complexity are crucial for me. I don't perform division and also I don't have to worry about infs and NANs.

Therefore I have decided to use a custom internal representation of floating points using two's complement representation as described above. Are there any obvious disadvantages I should be careful about?

alias · Accepted Answer

This is a well-studied topic, and there are systems that are using 2's complement floating-point representations; typically those that predate IEEE-754, though recent incarnations are available too. See this paper for a study of the properties of such a system: https://hal.archives-ouvertes.fr/hal-00157268/document

Kahan himself (the designer of the IEEE754 standard) did argue that having separate +/-0 is important for the approximations that floating-point is typically used for, where it is important if a floating-point 0 result is essentially positive or negative. See https://people.freebsd.org/~das/kahan86branch.pdf for details.

So, yes: It is entirely possible to have 2's complement floats; but the standard picked sign-magnitude representation. Whichever you pick, some operations will be easy and some will be harder; comparison being the most obvious. Of course, there's nothing stopping you from picking whatever representation suits your needs the best if you're designing your own hardware! In particular, you can even go with so called unum's and posit's where exponent and significand portions are not fixed size, but rather depend on where you land on the range. See here: https://www.johndcook.com/blog/2018/04/11/anatomy-of-a-posit-number/

Chris Dodd · Answer

The reason 2s complement is used for integer operations is because it allows the same hardware and instructions to be used for both signed and unsigned operations, with just a tiny difference as to how overflow is detected. With floating point, noone cares about "unsigned" floating point, so there's no benefit (savings) to using 2s complement if you're implementing it at the bit level. The only way I can see an advantage to using 2s complement is if you are using hardware that already has 2s-complement ALUs of some kind.

2s complement has major asymmetry problems in its representation (there are more representable values <0 than >0) that causes all kinds of mathematical stability issues if you try to use it in any situation that requires rounding or potential loss of precision, such as floating-point is commonly used for.

Why not use a two's complement based floating-point?

Tags:

floating-point

precision

Abarajithan

2 Answers

alias

Chris Dodd

Recent Activity

Donate For Us

Why not use a two's complement based floating-point?

Tags:

floating-point

precision

Abarajithan

2 Answers

alias

Chris Dodd

Related questions

Recent Activity

Donate For Us