Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert/Quantize Float Range to Integer Range

Say I have a float in the range of [0, 1] and I want to quantize and store it in an unsigned byte. Sounds like a no-brainer, but in fact it's quite complicated:

The obvious solution looks like this:

unsigned char QuantizeFloat(float a)
{
  return (unsigned char)(a * 255.0f);
}

This works in so far that I get all numbers from 0 to 255, but the distribution of the integers is not even. The function only returns 255 if a is exactly 1.0f. Not a good solution.

If I do proper rounding I just shift the problem:

unsigned char QuantizeFloat(float a)
{
  return (unsigned char)(a * 255.0f + 0.5f);
}

Here the the result 0 only covers half of the float-range than any other number.

How do I do a quantization with equal distribution of the floating point range? Ideally I would like to get a equal distribution of integers if I quantize equally distributed random floats.

Any ideas?


Btw: Also my code is in C the problem is language-agnostic. For the non-C people: Just assume that float to int conversion truncates the float.

EDIT: Since we had some confusion here: I need a mapping that maps the smallest input float (0) to the smallest unsigned char, and the highest float of my range (1.0f) to the highest unsigned byte (255).

like image 542
Nils Pipenbrinck Avatar asked Mar 01 '09 15:03

Nils Pipenbrinck


People also ask

How do I convert a float to an integer?

A float value can be converted to an int value no larger than the input by using the math. floor() function, whereas it can also be converted to an int value which is the smallest integer greater than the input using math. ceil() function. The math module is to be imported in order to use these methods.

What is integer quantization?

Integer quantization is an optimization strategy that converts 32-bit floating-point numbers (such as weights and activation outputs) to the nearest 8-bit fixed-point numbers. This results in a smaller model and increased inferencing speed, which is valuable for low-power devices such as microcontrollers.

What is the range value of float?

Since the high-order bit of the mantissa is always 1, it is not stored in the number. This representation gives a range of approximately 3.4E-38 to 3.4E+38 for type float.


2 Answers

How about a * 256f with a check to reduce 256 to 255? So something like:

return (unsigned char) (min(255, (int) (a * 256f)));

(For a suitable min function on your platform - I can't remember the C function for it.)

Basically you want to divide the range into 256 equal portions, which is what that should do. The edge case for 1.0 going to 256 and requiring rounding down is just because the domain is inclusive at both ends.

like image 141
Jon Skeet Avatar answered Sep 21 '22 20:09

Jon Skeet


I think what you are looking for is this:

unsigned char QuantizeFloat (float a)
{
  return (unsigned char) (a * 256.0f);
}

This will map uniform float values in [0, 1] to uniform byte values in [0, 255]. All values in [i/256, (i+1)/256[ (that is excluding (i+1)/256), for i in 0..255, are mapped to i. What might be undesirable is that 1.0f is mapped to 256.0f which wraps around to 0.

like image 26
cr333 Avatar answered Sep 22 '22 20:09

cr333