Something like this:
_declspec(align(16)) float dens[4];
//Here the code comes. F32vec4 S_START, Pos, _Vector
*((__m128*)dens) = (S_START - Pos) *_Vector;
float steps = max(max(dens[3], dens[2]), max(dens[1], dens[0]));
How do I do this directly using SSE?
There's no easy way to do this. SSE isn't particularly meant for horizontal operations. So you have to shuffle...
Here's one approach:
__m128 a = _mm_set_ps(10,9,7,8);
__m128 b = _mm_shuffle_ps(a,a,78); // {a,b,c,d} -> {c,d,a,b}
a = _mm_max_ps(a,b);
b = _mm_shuffle_ps(a,a,177); // {a,b,c,d} -> {b,a,d,c}
a = _mm_max_ss(a,b);
float out;
_mm_store_ss(&out,a);
I note that the final store isn't really supposed to be a store. It's just a hack to get the value into the float
datatype.
In reality no instruction is needed because float
types will be stored in the same SSE registers. (It's just that the top 3 values are ignored.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With