Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the max member in a __m128(F32vec4)

Tags:

c

simd

sse

Something like this:

_declspec(align(16)) float dens[4];

//Here the code comes. F32vec4 S_START, Pos, _Vector

*((__m128*)dens) = (S_START - Pos) *_Vector;

float steps = max(max(dens[3], dens[2]), max(dens[1], dens[0]));

How do I do this directly using SSE?

like image 541
user1468756 Avatar asked Jan 16 '23 07:01

user1468756


1 Answers

There's no easy way to do this. SSE isn't particularly meant for horizontal operations. So you have to shuffle...

Here's one approach:

__m128 a = _mm_set_ps(10,9,7,8);

__m128 b = _mm_shuffle_ps(a,a,78);  //  {a,b,c,d} -> {c,d,a,b}
a = _mm_max_ps(a,b);

b = _mm_shuffle_ps(a,a,177);        //  {a,b,c,d} -> {b,a,d,c}
a = _mm_max_ss(a,b);

float out;
_mm_store_ss(&out,a);

I note that the final store isn't really supposed to be a store. It's just a hack to get the value into the float datatype.

In reality no instruction is needed because float types will be stored in the same SSE registers. (It's just that the top 3 values are ignored.)

like image 119
Mysticial Avatar answered Jan 28 '23 12:01

Mysticial