I have two arrays: char* c
and float* f
and I need to do this operation:
// Compute float mask
float* f;
char* c;
char c_thresh;
int n;
for ( int i = 0; i < n; ++i )
{
if ( c[i] < c_thresh ) f[i] = 0.0f;
else f[i] = 1.0f;
}
I am looking for a fast way to do it: without conditionals and using SSE (4.2 or AVX) if possible.
If using float
instead of char
can result in faster code, I can change my code to use floats only:
// Compute float mask
float* f;
float* c;
float c_thresh;
int n;
for ( int i = 0; i < n; ++i )
{
if ( c[i] < c_thresh ) f[i] = 0.0f;
else f[i] = 1.0f;
}
Thanks
Pretty easy, just do the comparison, convert bytes to dword, AND with 1.0f: (not tested, and this isn't meant to be copy&paste code anyway, it's meant to show how you do it)
movd xmm0, [c] ; read 4 bytes from c
pcmpgtb xmm0, threshold ; compare (note: comparison is >, not >=, so adjust threshold)
pmovzxbd xmm0, xmm0 ; convert bytes to dwords
pand xmm0, one ; AND all four elements with 1.0f
movdqa [f], xmm0 ; save result
Should be pretty easy to convert to intrinsics.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With