Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert "__m256 with random-bits" into float values of [0, 1] range

I have a __m256 value that holds random bits.

I would like to to "interpret" it, to obtain another __m256 that holds float values in a uniform [0.0f, 1.0f] range.

Planning to do it using:

__m256 randomBits = /* generated random bits, uniformly distribution */;
__m256 invFloatRange =  _mm256_set1_ps( numeric_limits<float>::min() ); //min is a smallest increment of float precision

__m256 float01 =  _mm256_mul(randomBits, invFloatRange);
//float01 is now ready to be used

Question 1:

However, will this cause a problem in very rare cases where randomBits has all bits as 1 and is therefore NAN?

What can I do to protect myself from this?

I want the float01 to always be a usable number

Question 2:

Will the [0 to 1] range remain uniform after I obtain it using the above approach? I know float has varying precision at different magnitudes

like image 867
Kari Avatar asked Dec 31 '20 08:12

Kari


2 Answers

Reinterpreting an int32_t as float, one can

 auto const one = _mm256_set1_epi32(0x7f800000);
 a = _mm256_and_si256(a, _mm256_set1_epi32(0x007fffff));
 a = _mm256_or_si256(a, one);
 return _mm256_sub_ps(_mm256_castsi256_ps(a), _mm256_castsi256_ps(one));

The and/or sequence will reuse the 23 LSBs of the input sequence to produce a uniform distribution of values between 1.0f <= a < 2.0f. And then the bias of 1.0f is removed.

like image 197
Aki Suihkonen Avatar answered Nov 14 '22 23:11

Aki Suihkonen


As @Soonts has pointed out, floats can be created uniformly in [0, 1] range:

https://stackoverflow.com/a/54873925/9007125

I ended up using the answer below:

https://stackoverflow.com/a/54893167/9007125

//converts __m256i values into __m256 values, that contains floats in [0,1] range.
//https://stackoverflow.com/a/54893167/9007125
inline void int_rand_int_toFloat01( const __m256i* m256i_vals,  
                                          __m256* m256f_vals){ //<-- stores here.
    const static __m256 c =  _mm256_set1_ps(0x1.0p-24f); // or (1.0f / (uint32_t(1) << 24));

    __m256i* rnd =   ((__m256i*)m256i_vals);
    __m256* output =  ((__m256*)m256f_vals);

    // remember that '_mm256_cvtepi32_ps' will convert 32-bit ints into a 32-bit floats
    __m256 converted =  _mm256_cvtepi32_ps(_mm256_srli_epi32(*rnd, 8));
             *output =  _mm256_mul_ps( converted, c);
}
like image 43
Kari Avatar answered Nov 14 '22 22:11

Kari