Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

If statements with comparison SSE in C

Tags:

c

sse

I want to achieve this:

for (int i=0;i<n,i++){
  if (x[i] > 2.0f || x[i] < -2.0f) 
     a[i] += x[i]; 
}

I have gone this far but don't know what to do next:

__m128 P2f = _mm_set1_ps(2.0f);
__m128 M2f = _mm_set1_ps(-2.0f);
for(int i=0;i<n,i+=4){
__m128 xv = _mm_load_ps(x+i);
__m128 av = _mm_load_ps(a+i);

__m128 c1 = _mm_cmpgt_ps(xv, P2f);
__m128 c2 = _mm_cmplt_ps(xv, M2f);

__m128 or = _mm_or_ps(c1,c2);
    =???==
av = _mm_add_ps(av, xv);
_mm_store_ps(a+i, av);
}
like image 515
Snebhu Avatar asked Mar 12 '13 21:03

Snebhu


People also ask

What is simple if statement in C?

1. if statement in C/C++ if statement is the most simple decision-making statement. It is used to decide whether a certain statement or block of statements will be executed or not i.e if a certain condition is true then a block of statement is executed otherwise not.

What is if else if statement in C?

C has the following conditional statements: Use if to specify a block of code to be executed, if a specified condition is true. Use else to specify a block of code to be executed, if the same condition is false. Use else if to specify a new condition to test, if the first condition is false.

Why is my IF statement not working in C?

The short answer here is that you've written illegible code that you can no longer read. A few things to consider: (1) Use more, smaller, well-named functions (2) Use meaningful variable names (3) Make if statements that read like english.


2 Answers

You're close:

const __m128 P2f = _mm_set1_ps(2.0f);
const __m128 M2f = _mm_set1_ps(-2.0f);
for (int i = 0; i < n; i += 4)
{
    __m128 xv = _mm_load_ps(x + i);
    __m128 av = _mm_load_ps(a + i);

    __m128 c1v = _mm_cmpgt_ps(xv, P2f);
    __m128 c2v = _mm_cmplt_ps(xv, M2f);

    __m128 cv = _mm_or_ps(c1v, c2v);

    xv = _mm_and_ps(xv, cv);

    av = _mm_add_ps(av, xv);

    _mm_store_ps(a + i, av);
}

The trick is to OR the two comparison results and then use this combined result as a mask to zero out the X values which do not pass the test using a bitwise AND operation. You then add the masked X vector, which will add 0 or the original X value to each element of A according to the mask.


For the alternate version as mentioned in your comment below you'd do this:

const __m128 P2f = _mm_set1_ps(2.0f);
const __m128 M2f = _mm_set1_ps(-2.0f);
for (int i = 0; i < n; i += 4)
{
    __m128 xv = _mm_load_ps(x + i);
    __m128 av = _mm_load_ps(a + i);

    __m128 c1v = _mm_cmpgt_ps(xv, P2f);
    __m128 c2v = _mm_cmplt_ps(xv, M2f);

    __m128 cv = _mm_or_ps(c1v, c2v);

    xv = _mm_and_ps(P2f, cv); // <<< change this line to get a[i] += 2.0f
                              //     instead of a[i] += x[i]

    av = _mm_add_ps(av, xv);

    _mm_store_ps(a + i, av);
}

For the third version you mention in later comments below (a[i] *= 2.0) it's slightly trickier, but you can do it by thinking of the expression as a[i] += a[i]:

const __m128 P2f = _mm_set1_ps(2.0f);
const __m128 M2f = _mm_set1_ps(-2.0f);
for (int i = 0; i < n; i += 4)
{
    __m128 xv = _mm_load_ps(x + i);
    __m128 av = _mm_load_ps(a + i);

    __m128 c1v = _mm_cmpgt_ps(xv, P2f);
    __m128 c2v = _mm_cmplt_ps(xv, M2f);

    __m128 cv = _mm_or_ps(c1v, c2v);

    xv = _mm_and_ps(av, cv)); // <<< change this line to get a[i] *= 2.0f (a[i] += a[i])
                              //     instead of a[i] += x[i]

    av = _mm_add_ps(av, xv);

    _mm_store_ps(a + i, av);
}
like image 134
Paul R Avatar answered Sep 28 '22 04:09

Paul R


I would only add to Paul's excellent answer that you need only do a single compare, by taking advantage of the symmetry about zero:

const __m128 absMask = (__m128)_mm_set1_epi32(0x7fffffff);
const __m128 two = _mm_set1_ps(2.0f);

for (int i = 0; i < n; i += 4) {
    __m128 xv = _mm_load_ps(x + i);
    __m128 av = _mm_load_ps(a + i);
    __m128 absxv = _mm_and_ps(xv, absMask); // |x|
    __m128 mask = _mm_cmpgt_ps(absxv, two); // |x| > 2 ?
    xv = _mm_and_ps(xv, cv);                // |x| > 2 ? x : 0
    av = _mm_add_ps(av, xv);                // |x| > 2 ? a + x : a + 0
    _mm_store_ps(a + i, av);
}
like image 34
Stephen Canon Avatar answered Sep 28 '22 03:09

Stephen Canon