Assuming you are considering IEEE-754 format for floating point numbers for things like single and double precision, what is the smallest floating point format you could possibly have?
I know there are half-floats, and miniflooats, but how small still makes sense? I know the applications might not be there to make the format great for any practical use however.
I'm trying to determine what is the smallest mantissa bitwidth you could have and smallest exponent width?
For instance, does it make sense to have a mantissa that is in X.X format (assuming single precision would be represented as X.XXXXXXXXXXXXXXXXXXXXXXX)? Also, does it make sense to have an exponent with width 1?
As an example of what I'm thinking:
If you had X.X format, and no exponent, then your only possible numbers are +/- {1.0,1.1}, but is there something fundamental about floating point numbers or format that makes these impossible to consider?
I have occasionally used a four-bit FP format: 2 exponent bits and 1 significand bits. This gives you the following set of values:
encoding value
x000 +/-0.0
x001 +/-0.5
x010 +/-1.0
x011 +/-1.5
x100 +/-2.0
x101 +/-3.0
x110 +/-Inf
x111 NaN
Obviously, you can't do much useful computation with this format, but it's useful for teaching because it's the smallest format that gives you all of the interesting edge cases (no signaling NaN, though, if you care about that, unless you want to make "-NaN" signaling).
In some sense, this is the "smallest" floating-point format that isn't totally degenerate, but you'd still never use it because it's worse in basically every way than a 4-bit signed fixed-point format with one fractional bit. The smallest floating-point format that really passes this test in a general setting is half precision (though there are some niche uses for 8b formats).
The three-bit format with no significand bits almost works; it gives you +/-0, +/-1, +/-2, and +/-Inf, but there's no NaN encoding available if you follow the usual IEEE-754 encoding rules. It would be nicer to use b010
for Inf and b011
for NaN, but then no rounding ever occurs in arithmetic (except for 1 + 1 overflowing), which isn't very useful for teaching.
In general the whole point of floating numbers is to be able to represent a wider range of values, whether small or large, than is permitted by some binary number representation.
The smallest practical format that I've come across is the tiny 8-bit floating point representation. It looks like this:
[ 1-bit sign ] [ 4-bit exponent ] [ 3-bit mantissa/fraction ]
In this case, your range for the exponent is limited from 1/64 to 128 (because you need a representation for NaN/infinity). Recall that FP is evaluated as sign x (1 + mantissa) x 2^(exponent - bias)
.
You can continue to extrapolate from the IEEE-754 format and even come up with a 6-bit floating point representation:
[ 1-bit sign ] [ 3-bit exponent ] [ 2-bit mantissa/fraction ]
but then what ends up happening is that the distribution of valid values gets spread around closer to zero (i.e., you're able to express numbers closer to zero with more precision than with numbers further away from zero).
I guess you could keep going until you just run out of bits (maybe you drop the sign, or you change the bias depending on your application and which valid values you need), but at some point you'll need to reconsider calling your format "floating point".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With