How can I take the reciprocal (inverse) of floats with SSE instructions, but only for non-zero values?
Background bellow:
I want to normalize an array of vectors so that each dimension has the same average. In C this can be coded as:
float vectors[num * dim]; // input data
// step 1. compute the sum on each dimension
float norm[dim];
memset(norm, 0, dim * sizeof(float));
for(int i = 0; i < num; i++) for(int j = 0; j < dims; j++)
norm[j] += vectors[i * dims + j];
// step 2. convert sums to reciprocal of average
for(int j = 0; j < dims; j++) if(norm[j]) norm[j] = float(num) / norm[j];
// step 3. normalize the data
for(int i = 0; i < num; i++) for(int j = 0; j < dims; j++)
vectors[i * dims + j] *= norm[j];
Now for performance reasons, I want to do this using SSE intinsics. Setp 1 et step 3 are easy, but I'm stuck at step 2. I don't seem to find any code sample or obvious SSE instruction to take the recirpocal of a value if it is not zero. For the division, _mm_rcp_ps does the trick, and maybe combine it with a conditional move, but how to get a mask indicating which component is zero?
I don't need the code to the algorithm described above, just the "inverse if not zero" function:
__m128 rcp_nz_ps(__m128 input) {
// ????
}
Thanks!
__m128 rcp_nz_ps(__m128 input) {
__m128 mask = _mm_cmpeq_ps(_mm_set1_ps(0.0), input);
__m128 recip = _mm_rcp_ps(input);
return _mm_andnot_ps(mask, recip);
}
Each lane of mask
is set to either b111...11
if the input is zero, and b000...00
otherwise. And-not with that mask replaces elements of the reciprocal corresponding to a zero input with zero.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With