I want to achieve this:
for (int i=0;i<n,i++){
if (x[i] > 2.0f || x[i] < -2.0f)
a[i] += x[i];
}
I have gone this far but don't know what to do next:
__m128 P2f = _mm_set1_ps(2.0f);
__m128 M2f = _mm_set1_ps(-2.0f);
for(int i=0;i<n,i+=4){
__m128 xv = _mm_load_ps(x+i);
__m128 av = _mm_load_ps(a+i);
__m128 c1 = _mm_cmpgt_ps(xv, P2f);
__m128 c2 = _mm_cmplt_ps(xv, M2f);
__m128 or = _mm_or_ps(c1,c2);
=???==
av = _mm_add_ps(av, xv);
_mm_store_ps(a+i, av);
}
1. if statement in C/C++ if statement is the most simple decision-making statement. It is used to decide whether a certain statement or block of statements will be executed or not i.e if a certain condition is true then a block of statement is executed otherwise not.
C has the following conditional statements: Use if to specify a block of code to be executed, if a specified condition is true. Use else to specify a block of code to be executed, if the same condition is false. Use else if to specify a new condition to test, if the first condition is false.
The short answer here is that you've written illegible code that you can no longer read. A few things to consider: (1) Use more, smaller, well-named functions (2) Use meaningful variable names (3) Make if statements that read like english.
You're close:
const __m128 P2f = _mm_set1_ps(2.0f);
const __m128 M2f = _mm_set1_ps(-2.0f);
for (int i = 0; i < n; i += 4)
{
__m128 xv = _mm_load_ps(x + i);
__m128 av = _mm_load_ps(a + i);
__m128 c1v = _mm_cmpgt_ps(xv, P2f);
__m128 c2v = _mm_cmplt_ps(xv, M2f);
__m128 cv = _mm_or_ps(c1v, c2v);
xv = _mm_and_ps(xv, cv);
av = _mm_add_ps(av, xv);
_mm_store_ps(a + i, av);
}
The trick is to OR
the two comparison results and then use this combined result as a mask to zero out the X values which do not pass the test using a bitwise AND
operation. You then add the masked X vector, which will add 0 or the original X value to each element of A according to the mask.
For the alternate version as mentioned in your comment below you'd do this:
const __m128 P2f = _mm_set1_ps(2.0f);
const __m128 M2f = _mm_set1_ps(-2.0f);
for (int i = 0; i < n; i += 4)
{
__m128 xv = _mm_load_ps(x + i);
__m128 av = _mm_load_ps(a + i);
__m128 c1v = _mm_cmpgt_ps(xv, P2f);
__m128 c2v = _mm_cmplt_ps(xv, M2f);
__m128 cv = _mm_or_ps(c1v, c2v);
xv = _mm_and_ps(P2f, cv); // <<< change this line to get a[i] += 2.0f
// instead of a[i] += x[i]
av = _mm_add_ps(av, xv);
_mm_store_ps(a + i, av);
}
For the third version you mention in later comments below (a[i] *= 2.0
) it's slightly trickier, but you can do it by thinking of the expression as a[i] += a[i]
:
const __m128 P2f = _mm_set1_ps(2.0f);
const __m128 M2f = _mm_set1_ps(-2.0f);
for (int i = 0; i < n; i += 4)
{
__m128 xv = _mm_load_ps(x + i);
__m128 av = _mm_load_ps(a + i);
__m128 c1v = _mm_cmpgt_ps(xv, P2f);
__m128 c2v = _mm_cmplt_ps(xv, M2f);
__m128 cv = _mm_or_ps(c1v, c2v);
xv = _mm_and_ps(av, cv)); // <<< change this line to get a[i] *= 2.0f (a[i] += a[i])
// instead of a[i] += x[i]
av = _mm_add_ps(av, xv);
_mm_store_ps(a + i, av);
}
I would only add to Paul's excellent answer that you need only do a single compare, by taking advantage of the symmetry about zero:
const __m128 absMask = (__m128)_mm_set1_epi32(0x7fffffff);
const __m128 two = _mm_set1_ps(2.0f);
for (int i = 0; i < n; i += 4) {
__m128 xv = _mm_load_ps(x + i);
__m128 av = _mm_load_ps(a + i);
__m128 absxv = _mm_and_ps(xv, absMask); // |x|
__m128 mask = _mm_cmpgt_ps(absxv, two); // |x| > 2 ?
xv = _mm_and_ps(xv, cv); // |x| > 2 ? x : 0
av = _mm_add_ps(av, xv); // |x| > 2 ? a + x : a + 0
_mm_store_ps(a + i, av);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With