I have to write a program that will simulate floating point multiplication. For this program, we assume that a single precision floating point number is stored in unsigned long a. I have to multiply the number stored in a by 2 using only the following operators: << >> | & ~ ^
I understand the functions of these operators, but I'm confused on the logic of how to go about implementing this. Any help would be greatly appreciated.
have to multiply the number stored in a by 2 using only the following operators: << >> | & ~ ^
since we are given an unsigned long to simulate a float value with a single point of precision, we're supposed to handle all that could be simulated. ref
First let's us assume the float is encoded as binary32 and that unsigned is 32-bit. C does not require either of these.
First isolate the exponent to deal with the float sub-groups: sub-normal, normal, infinity and NAN.
Below is some lightly tested code - I'll review later, For now consider it a pseudo code template.
#define FLT_SIGN_MASK 0x80000000u
#define FLT_MANT_MASK 0x007FFFFFu
#define FLT_EXPO_MASK 0x7F800000u
#define FLT_EXPO_LESSTHAN_MAXLVAUE(e) ((~(e)) & FLT_EXPO_MASK)
#define FLT_EXPO_MAX FLT_EXPO_MASK
#define FLT_EXPO_LSBit 0x00800000u
unsigned increment_expo(unsigned a) {
unsigned carry = FLT_EXPO_LSBit;
do {
unsigned sum = a ^ carry;
carry = (a & carry) << 1;
a = sum;
} while (carry);
return a;
}
unsigned float_x2_simulated(unsigned x) {
unsigned expo = x & FLT_EXPO_MASK;
if (expo) { // x is a normal, infinity or NaN
if (FLT_EXPO_LESSTHAN_MAXLVAUE(expo)) { // x is a normal
expo = increment_expo(expo); // Double the number
if (FLT_EXPO_LESSTHAN_MAXLVAUE(expo)) { // no overflow
return (x & (FLT_SIGN_MASK | FLT_MANT_MASK)) | expo;
}
return (x & FLT_SIGN_MASK) | FLT_EXPO_MAX;
}
// x is an infinity or NaN
return x;
}
// x is a sub-normal
unsigned m = (x & FLT_MANT_MASK) << 1; // Double the value
if (m & FLT_SIGN_MASK) {
// Doubling caused sub-normal to become normal
// Special code not needed here and the "carry" becomes the 1 exponent.
}
return (x & FLT_SIGN_MASK) | m;
}
Here is my code that uses bitwise operators.
This code multiply by 2 a single precision floating point increasing by 1 the floating point exponent and uses only bitwise operators; furthermore takes care of exponent and number signs (bits 30 and 31).
It doesn't pretend to cover all aspect of floating point elaboration.
Remember that if the bits 30 and/or 31 are changed by the code we had an overflow.
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
int main()
{
float f = -23.45F;
uint32_t *i=(uint32_t *)(&f);
uint32_t sgn;
uint32_t c,sc;
printf("%08X %f\n",*i,f);
sgn = *i & (0xC0000000); // copies bits 31 and 30
c = *i & (1U<<23);
*i ^= (1U<<23);
while(c)
{
sc = c << 1;
c = *i & sc;
*i ^= sc;
};
if (sgn != *i & (0xC0000000)) {
puts("Exponent overflow");
}
printf("%08X %f\n",*i,f);
return 0;
}
See also: Wikipedia Single-precision floating point
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With