I'm trying to convert an int
into a custom float, in which the user specifies the amount of bits reserved for the exp and mantissa, but I don't understand how the conversion works. My function takes in an int value and and int exp to represent the number (value * 2^exp) i.e value = 12, exp = 4, returns 192. but I don't understand the process I need to do to change these. I've been looking at this for days and playing with IEEE converter web apps but I just don't understand what the normalization process is. Like I see that its "move the binary point and adjust the exponent" but I have no idea what this means, can anyone give me an example to go off of? Also I don't understand what the exponent bias is. The only info I have is that you just add a number to your exponent but I don't understand why. I've been searching Google for an example I can understand but this just isn't making any sense to me
A normalized mantissa has its binary point (the base-two equivalent of a decimal point) just to the left of the most significant non-zero digit. Because the binary number system has just two digits -- zero and one -- the most significant digit of a normalized mantissa is always a one.
A floating point number is normalized when we force the integer part of its mantissa to be exactly 1 and allow its fraction part to be whatever we like. For example, if we were to take the number 13.25 , which is 1101.01 in binary, 1101 would be the integer part and 01 would be the fraction part.
A floating point number is normalized when we force the integer part of its mantissa to be exactly 1
and allow its fraction part to be whatever we like.
For example, if we were to take the number 13.25
, which is 1101.01
in binary, 1101
would be the integer part and 01
would be the fraction part.
I could represent 13.25
as 1101.01*(2^0)
, but this isn't normalized because the integer part is not 1
. However, we are allowed to shift the mantissa to the right one digit if we increase the exponent by 1:
1101.01*(2^0)
= 110.101*(2^1)
= 11.0101*(2^2)
= 1.10101*(2^3)
This representation 1.10101*(2^3)
is the normalized form of 13.25
.
That said, we know that normalized floating point numbers will always come in the form 1.fffffff * (2^exp)
For efficiency's sake, we don't bother storing the 1
integer part in the binary representation itself, we just pretend it's there. So if we were to give your custom-made float type 5 bits for the mantissa, we would know the bits 10100
would actually stand for 1.10100
.
Here is an example with the standard 23-bit mantissa:
As for the exponent bias, let's take a look at the standard 32-bit float
format, which is broken into 3 parts: 1 sign bit, 8 exponent bits, and 23 mantissa bits:
s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm
The exponents 00000000
and 11111111
have special purposes (like representing Inf
and NaN
), so with 8 exponent bits, we could represent 254 different exponents, say 2^1
to 2^254
, for example. But what if we want to represent 2^-3
? How do we get negative exponents?
The format fixes this problem by automatically subtracting 127 from the exponent. Therefore:
0000 0001
would be 1 -127 = -126
0010 1101
would be 45 -127 = -82
0111 1111
would be 127-127 = 0
1001 0010
would be 136-127 = 9
This changes the exponent range from 2^1 ... 2^254
to 2^-126 ... 2^+127
so we can represent negative exponents.
Tommy -- chux and eigenchris, along with the others have provided excellent answers, but if I am looking at your comments correctly, you still seem to be struggling with the nuts-and-bolts of "how would I take this info and then use this in creating a custom float representation where the user specifies the amount of bits for the exponent?" Don't feel bad, it is a clear as mud the first dozen times you go through it. I think I can take a stab at clearing it up.
You are familiar with the IEEE754-Single-Precision-Floating-Point representation of:
IEEE-754 Single Precision Floating Point Representation of (13.25)
0 1 0 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -|
|s| exp | mantissa |
That the 1-bit sign-bit
, 8-bit biased exponent
(in 8-bit excess-127 notation), and the remaining 23-bit mantissa
.
When you allow the user to choose the number of bits in the exponent, you are going to have to rework the exponent notation to work with the new user-chosen limit.
What will that change?
Will it change the sign-bit
handling -- No.
Will it change the mantissa
handling -- No
(you will still convert the mantissa/significand to "hidden bit" format).
So the only thing you need to focus on is exponent handling
.
How would you approach this? Recall, the current 8-bit exponent is in what is called excess-127 notation (where 127 represents the largest value for 7
bits allowing any bias to be contained and expressed within the current 8-bit
limit. If your user chooses 6 bits as the exponent size, then what? You will have to provide a similar method to insure you have a fixed number to represent your new excess-## notation that will work within the user limit.
Take a 6-bit
user limit, then a choice for the unbiased exponent value could be tried as 31
(the largest values that can be represented in 5-bits
). To that you could apply the same logic (taking the 13.25 example above). Your binary representation for the number is 1101.01
to which you move the decimal 3 positions to the left
to get 1.10101
which gives you an exponent bias of 3
.
In your 6-bit exponent
case you would add 3 + 31
to obtain your excess-31 notation
for the exponent: 100010
, then put the mantissa in "hidden bit" format (i.e. drop the leading 1
from 1.10101
resulting in your new custom Tommy Precision Representation:
IEEE-754 Tommy Precision Floating Point Representation of (13.25)
0 1 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
|- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -|
|s| exp | mantissa |
With 1-bit sign-bit
, 6-bit biased exponent
(in 6-bit excess-31 notation), and the remaining 25-bit mantissa
.
The same rules would apply to reversing the process to get your floating point number back from the above notation. (just using 31
instead of 127
to back the bias out of the exponent)
Hopefully this helps in some way. I don't see much else you can do if you are truly going to allow for a user-selected exponent size. Remember, the IEEE-754 standard wasn't something that was guessed at and a lot of good reasoning and trade-offs went into arriving at the 1-8-23 sign-exponent-mantissa layout. However, I think your exercise does a great job at requiring you to firmly understand the standard.
Now totally lost and not addressed in this discussion is what effects this would have on the range of numbers that could be represented in this Custom Precision Floating Point Representation
. I haven't looked at it, but the primary limitation would seem to be a reduction in the MAX/MIN
that could be represented.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With