Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

adding the components of an SSE register

I want to add the four components of an SSE register to get a single float. This is how I do it now:

float a[4];
_mm_storeu_ps(a, foo128);
float x = a[0] + a[1] + a[2] + a[3];

Is there an SSE instruction that directly achieves this?

like image 911
fredoverflow Avatar asked Dec 16 '11 15:12

fredoverflow


2 Answers

You could probably use the HADDPS SSE3 instruction, or its compiler intrinsic _mm_hadd_ps,

For example, see http://msdn.microsoft.com/en-us/library/yd9wecaa(v=vs.80).aspx

If you have two registers v1 and v2 :

v = _mm_hadd_ps(v1, v2);
v = _mm_hadd_ps(v, v);

Now, v[0] contains the sum of v1's components, and v[1] contains the sum of v2's components.

like image 81
user1071136 Avatar answered Oct 26 '22 22:10

user1071136


If you want your code to work on pre-SSE3 CPUs (which do not support _mm_hadd_ps), you might use the following code. It uses more instructions, but decodes to less microops on most CPUs.

 __m128 temp = _mm_add_ps(_mm_movehl_ps(foo128, foo128), foo128);
 float x;
 _mm_store_ss(&x, _mm_add_ss(temp, _mm_shuffle_ps(temp, 1)));
like image 29
Marat Dukhan Avatar answered Oct 26 '22 20:10

Marat Dukhan