Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert a uint64_t to a double/float between 0 and 1 with maximum accuracy (C++)?

I'm writing an image class based on unsigned integers. I'm using uint8_t and uint16_t buffers currently for 8-bit and 16-bit RGBA pixels, and to convert from 16-bit to 8-bit I simply have to take the 16 bit value, divide by std::numeric_limits< uint16_t >::max() converted to a double, then multiply that by 255.

However, if I wanted to have an image with 64-bit unsigned integers for each RGBA component (I know, it's absurdly high), how would I go about finding a float/double between 0 and 1 that represents how far between 0 and the max uint64_t my pixel value is? I assume that converting to doubles wouldn't work because doubles are generally 64-bit floats, and you can't capture all 64-bit unsigned integer values in a 64-bit float. Dividing without converting to floats/doubles would just give me 0 or sometimes 1.

What is the most accurate way to find a floating point value between 0 and 1 that represents how far between 0 and the maximum possible an unsigned 64-bit value is?

like image 803
Thomas Avatar asked Oct 24 '17 01:10

Thomas


People also ask

Is UINT64 a double?

Since the maximum value of a UINT64 is 2^64 - 1, you can't safely cast it to a double if you want to keep all significative digit. So, looking at your code, what you are doing is fine.

Can Double hold UINT64?

According to Wikipedia, you can store 15.95 decimal digits in a 64-bit IEEE 754 floating-point number. double has typically up to 53 bits precision, compared to 64.


2 Answers

What is the most accurate way to find a floating point value between 0 and 1 that represents how far between 0 and the maximum possible an unsigned 64-bit value is?

To map integer values in the range [0...264) to [0 ... 1.0) can be done directly.

  1. Convert from uint64_t to double.

  2. Scale by 264@Mark Ransom

     #define TWO63 0x8000000000000000u 
     #define TWO64f (TWO63*2.0)
    
     double map(uint64_t u) {
       double y = (double) u; 
       return y/Two64f;
     }
    

The will map

Integer values in the range [263...264) to [0.5 ... 1.0): 252 different double values.
Integer values in the range [262...263) to [0.25 ... 0.5): 252 different double values.
Integer values in the range [261...262) to [0.125 ... 0.25): 252 different double values.
...
Integer values in the range [252...253) to [2-12 ... 2-11): 252 different double values.
Integer values in the range [0...252) to [2-13 ... 2-12): 252 different double values.


To map integer values in the range [0...264) to [0 ... 1.0] is more difficult. (Note the ] vs. ).


[Feb 2021] I see this answer needs re-explanation on upper edge cases. Potential values returned include 1.0.

like image 137
chux - Reinstate Monica Avatar answered Sep 20 '22 16:09

chux - Reinstate Monica


You can get a start from the following code for Java's java.util.Random nextDouble() method. It takes 53 bits and forms a double from them:

   return (((long)next(26) << 27) + next(27))
     / (double)(1L << 53);

I would use the most significant 26 bits of your long for the shifted value, and the next 27 bits to fill in the low order bits. That discards the least significant 64-53 = 11 bits of the input.

If distinguishing very small values is especially important you could also use subnormal numbers, which nextDouble() does not return.

like image 25
Patricia Shanahan Avatar answered Sep 17 '22 16:09

Patricia Shanahan