Convert int to 16bit float (half precision floating point) in c++

Tags:

How can I convert an integer to a half precision float (which is to be stored into an array unsigned char[2]). The range to the input int will be from 1-65535. Precision is really not a concern.

I am doing something similar for converting to 16bit int into an unsigned char[2], but I understand there is not half precision float C++ datatype. Example of this below:

int16_t position16int = (int16_t)data;
memcpy(&dataArray, &position16int, 2);

987

asked Oct 10 '12 10:10

Ross

2 Answers

It's a very straightforward thing, all the info you need is in Wikipedia.

Sample implementation:

#include <stdio.h>

unsigned int2hfloat(int x)
{
  unsigned sign = x < 0;
  unsigned absx = ((unsigned)x ^ -sign) + sign; // safe abs(x)
  unsigned tmp = absx, manbits = 0;
  int exp = 0, truncated = 0;

  // calculate the number of bits needed for the mantissa
  while (tmp)
  {
    tmp >>= 1;
    manbits++;
  }

  // half-precision floats have 11 bits in the mantissa.
  // truncate the excess or insert the lacking 0s until there are 11.
  if (manbits)
  {
    exp = 10; // exp bias because 1.0 is at bit position 10
    while (manbits > 11)
    {
      truncated |= absx & 1;
      absx >>= 1;
      manbits--;
      exp++;
    }
    while (manbits < 11)
    {
      absx <<= 1;
      manbits++;
      exp--;
    }
  }

  if (exp + truncated > 15)
  {
    // absx was too big, force it to +/- infinity
    exp = 31; // special infinity value
    absx = 0;
  }
  else if (manbits)
  {
    // normal case, absx > 0
    exp += 15; // bias the exponent
  }

  return (sign << 15) | ((unsigned)exp << 10) | (absx & ((1u<<10)-1));
}

int main(void)
{
  printf(" 0: 0x%04X\n", int2hfloat(0));
  printf("-1: 0x%04X\n", int2hfloat(-1));
  printf("+1: 0x%04X\n", int2hfloat(+1));
  printf("-2: 0x%04X\n", int2hfloat(-2));
  printf("+2: 0x%04X\n", int2hfloat(+2));
  printf("-3: 0x%04X\n", int2hfloat(-3));
  printf("+3: 0x%04X\n", int2hfloat(+3));
  printf("-2047: 0x%04X\n", int2hfloat(-2047));
  printf("+2047: 0x%04X\n", int2hfloat(+2047));
  printf("-2048: 0x%04X\n", int2hfloat(-2048));
  printf("+2048: 0x%04X\n", int2hfloat(+2048));
  printf("-2049: 0x%04X\n", int2hfloat(-2049)); // first inexact integer
  printf("+2049: 0x%04X\n", int2hfloat(+2049));
  printf("-2050: 0x%04X\n", int2hfloat(-2050));
  printf("+2050: 0x%04X\n", int2hfloat(+2050));
  printf("-32752: 0x%04X\n", int2hfloat(-32752));
  printf("+32752: 0x%04X\n", int2hfloat(+32752));
  printf("-32768: 0x%04X\n", int2hfloat(-32768));
  printf("+32768: 0x%04X\n", int2hfloat(+32768));
  printf("-65504: 0x%04X\n", int2hfloat(-65504)); // legal maximum
  printf("+65504: 0x%04X\n", int2hfloat(+65504));
  printf("-65505: 0x%04X\n", int2hfloat(-65505)); // infinity from here on
  printf("+65505: 0x%04X\n", int2hfloat(+65505));
  printf("-65535: 0x%04X\n", int2hfloat(-65535));
  printf("+65535: 0x%04X\n", int2hfloat(+65535));
  return 0;
}

Output (ideone):

 0: 0x0000
-1: 0xBC00
+1: 0x3C00
-2: 0xC000
+2: 0x4000
-3: 0xC200
+3: 0x4200
-2047: 0xE7FF
+2047: 0x67FF
-2048: 0xE800
+2048: 0x6800
-2049: 0xE800
+2049: 0x6800
-2050: 0xE801
+2050: 0x6801
-32752: 0xF7FF
+32752: 0x77FF
-32768: 0xF800
+32768: 0x7800
-65504: 0xFBFF
+65504: 0x7BFF
-65505: 0xFC00
+65505: 0x7C00
-65535: 0xFC00
+65535: 0x7C00

103

answered Oct 12 '22 15:10

Alexey Frunze

I asked the question of how to convert 32-bit floating points to 16-bit floating point.

Float32 to Float16

So from that you could very easily convert the int to a float and then use the question above to create a 16-bit float. I would suggest this is probably much easier than going from int directly to 16-bit float. Effectively by converting to 32-bit float you have done most of the hardwork and then you just need to shift a few bits around.

Edit: Looking at Alexey's excellent answer I think its highly likely that using a hardware int to float conversion and then bit shifting it around is likely to be a fair bit faster than his method. Might be worth profiling both methods and comparing them.

answered Oct 12 '22 16:10

Goz

Related questions
                            
                                How to delete printed characters from command line in C++
                            
                                Am I doing something wrong with this CG program?
                            
                                What are the parameters passed to cvFindContours() in JavaCV?
                            
                                C++ Struct that contains a map of itself
                            
                                Segmentation fault at the end of destructor
                            
                                CRTP + Traits class : "no type named..."
                            
                                C++ BNF grammar with parsing/matching examples
                            
                                Map in Shared memory
                            
                                error while loading shared libraries: libgomp.so.1: , wrong GCC version?
                            
                                Qt QTimer is it safe to stop it this way?
                            
                                Auto vs concrete type when iterating over vector?
                            
                                std::function bound to member function
                            
                                template private inheritance in vc++10 is not accessible
                            
                                Questions regarding performance of swap
                            
                                Passing an Object to Overloaded Operator
                            
                                How to convert float to double(both stored in IEEE-754 representation) without losing precision?
                            
                                Assigning base struct to derived struct
                            
                                Should I make a member function virtual just to make a class testable?
                            
                                NMake Pattern Rules
                            
                                C# as a scripting language [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Convert int to 16bit float (half precision floating point) in c++

Tags:

c++

c

floating-point

Ross

People also ask

2 Answers

Alexey Frunze

Goz

Recent Activity

Donate For Us