Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simulating Floating Point Multiplication in C using Bitwise Operators [closed]

I have to write a program that will simulate floating point multiplication. For this program, we assume that a single precision floating point number is stored in unsigned long a. I have to multiply the number stored in a by 2 using only the following operators: << >> | & ~ ^

I understand the functions of these operators, but I'm confused on the logic of how to go about implementing this. Any help would be greatly appreciated.


2 Answers

have to multiply the number stored in a by 2 using only the following operators: << >> | & ~ ^

since we are given an unsigned long to simulate a float value with a single point of precision, we're supposed to handle all that could be simulated. ref

First let's us assume the float is encoded as binary32 and that unsigned is 32-bit. C does not require either of these.

First isolate the exponent to deal with the float sub-groups: sub-normal, normal, infinity and NAN.

Below is some lightly tested code - I'll review later, For now consider it a pseudo code template.

#define FLT_SIGN_MASK  0x80000000u
#define FLT_MANT_MASK  0x007FFFFFu
#define FLT_EXPO_MASK  0x7F800000u
#define FLT_EXPO_LESSTHAN_MAXLVAUE(e)   ((~(e)) & FLT_EXPO_MASK)
#define FLT_EXPO_MAX   FLT_EXPO_MASK
#define FLT_EXPO_LSBit 0x00800000u

unsigned increment_expo(unsigned a) {
  unsigned carry = FLT_EXPO_LSBit;
  do {
    unsigned sum = a ^ carry;
    carry = (a & carry) << 1;
    a = sum;
  } while (carry);
  return a;
}

unsigned float_x2_simulated(unsigned x) {
  unsigned expo = x & FLT_EXPO_MASK;
  if (expo) { // x is a normal, infinity or NaN
    if (FLT_EXPO_LESSTHAN_MAXLVAUE(expo)) { // x is a normal
      expo = increment_expo(expo);  // Double the number
      if (FLT_EXPO_LESSTHAN_MAXLVAUE(expo)) { // no overflow
        return (x & (FLT_SIGN_MASK | FLT_MANT_MASK)) | expo;
      }
      return (x & FLT_SIGN_MASK) | FLT_EXPO_MAX;
    }
    // x is an infinity or NaN
    return x;
  }
  // x is a sub-normal
  unsigned m = (x & FLT_MANT_MASK) << 1;  // Double the value
  if (m & FLT_SIGN_MASK) {
    // Doubling caused sub-normal to become normal
    // Special code not needed here and the "carry" becomes the 1 exponent.
  }
  return (x & FLT_SIGN_MASK) | m;
}
like image 169
chux - Reinstate Monica Avatar answered May 15 '26 08:05

chux - Reinstate Monica


Here is my code that uses bitwise operators.

This code multiply by 2 a single precision floating point increasing by 1 the floating point exponent and uses only bitwise operators; furthermore takes care of exponent and number signs (bits 30 and 31).

It doesn't pretend to cover all aspect of floating point elaboration.

Remember that if the bits 30 and/or 31 are changed by the code we had an overflow.

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

#include <inttypes.h>

int main()
{
    float f = -23.45F;

    uint32_t *i=(uint32_t *)(&f);
    uint32_t sgn;
    uint32_t c,sc;

    printf("%08X %f\n",*i,f);

    sgn = *i & (0xC0000000); // copies bits 31 and 30

    c = *i & (1U<<23);
    *i ^= (1U<<23);

    while(c)
    {
        sc = c << 1;
        c = *i & sc;
        *i ^= sc;
    };

    if (sgn != *i & (0xC0000000)) {
       puts("Exponent overflow");
    }

    printf("%08X %f\n",*i,f);

    return 0;
}

See also: Wikipedia Single-precision floating point

like image 21
Sir Jo Black Avatar answered May 15 '26 09:05

Sir Jo Black