I am trying to create a fast decoder for BPSK
using the AVX
intrinsics of Intel. I have a set of complex numbers that are represented as interleaved floats, but due to the BPSK
modulation only the real part (or the even indexed floats) are needed. Every float x
is mapped to 0
, when x < 0
and to 1
if x >= 0
. This is accomplished using the following routine:
static inline void
normalize_bpsk_constellation_points(int32_t *out, const complex_t *in, size_t num)
{
static const __m256 _min_mask = _mm256_set1_ps(-1.0);
static const __m256 _max_mask = _mm256_set1_ps(1.0);
static const __m256 _mul_mask = _mm256_set1_ps(0.5);
__m256 res;
__m256i int_res;
size_t i;
gr_complex temp;
float real;
for(i = 0; i < num; i += COMPLEX_PER_AVX_REG){
res = _mm256_load_ps((float *)&in[i]);
/* clamp them to avoid segmentation faults due to indexing */
res = _mm256_max_ps(_min_mask, _mm256_min_ps(_max_mask, res));
/* Scale accordingly for proper indexing -1->0, 1->1 */
res = _mm256_add_ps(res, _max_mask);
res = _mm256_mul_ps(res, _mul_mask);
/* And then round to the nearest integer */
res = _mm256_round_ps(res, _MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC);
int_res = _mm256_cvtps_epi32(res);
_mm256_store_si256((__m256i *) &out[2*i], int_res);
}
}
Firstly, I clamp all the received floats in the range [-1, 1]
. Then after some proper scaling, the result is rounded to the nearest integer. That will map all floats above 0.5
to 1
and all floats below 0.5
to 0
.
The procedure works fine if the input floats are normal numbers. However, due to some situations at previous stages, there is a possibility that some input floats are NaN
or -NaN
. At this case, 'NaN' numbers are propagated through the _mm256_max_ps()
, _mm256_min_ps()
and all other AVX
functions resulting to an integer mapping of -2147483648
which of course causes my program to crash due to invalid indexing.
Is there any workaround to avoid this problem, or at least set the NaN
to 0
using AVX
?
You could do it the simple way to begin with, compare and mask: (not tested)
res = _mm256_cmp_ps(res, _mm256_setzero_ps(), _CMP_NLT_US);
ires = _mm256_srl_epi32(_mm256_castps_si256(res), 31);
Or shift and xor: (also not tested)
ires = _mm256_srl_epi32(_mm256_castps_si256(res), 31);
ires = _mm256_xor_epi32(ires, _mm256_set1_epi32(1));
This version will also care about the sign of NaN (and ignore the NaN-ness).
Alternative for no AVX2 (not tested)
res = _mm256_cmp_ps(res, _mm256_setzero_ps(), _CMP_NLT_US);
res = _mm256_and_ps(res, _mm256_set1_ps(1.0f));
ires = _mm256_cvtps_epi32(res);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With