(I'm only interested in the 1st 3 components)
For example:[ 1 2 3 ? ]
should produce [ 0 0 -1 ? ]
Also, it's important to have only one "bit" set so that:[ 1 2 2 ? ]
should not produce [ 0 -1 -1 ? ]
but rather[ 0 -1 0 ? ]
or [ 0 0 -1 ? ]
(doesn't matter which one)
The later (bad) solution is possible for example by extracting the horizontal max and comparing to the original:
__m128 abcd; // input
__m128 ccac = _mm_shuffle_ps(abcd, abcd, 0x8A);
__m128 abcd_ccac = _mm_max_ps(abcd, ccac);
__m128 babb = _mm_shuffle_ps(abcd, abcd, 0x51);
__m128 abcd_ccac_babb = _mm_max_ps(abcd_ccac, babb);
__m128 mask = _mm_cmpeq_ps(abcd, abcd_ccac_babb);
Perhaps some bitwise operations to get rid of duplicate set bits?
Update:
Follow up, I've made another (bad) solution.
The key is to compare each component to another, avoiding equality equations (not having a >= b
and b >= a
in another place).
a > b & a >= c
b > c & b >= a
c > a & c >= b
to yield:
([ a b c ? ] > [ b c a ? ]) & ([ a b c ? ] >= [ c a b ? ])
and in code:
__m128 abcd; // input
__m128 bcad = _mm_shuffle_ps(abcd, abcd, 0xC9);
__m128 gt = _mm_cmpgt_ps(abcd, bcad);
__m128 cabd = _mm_shuffle_ps(abcd, abcd, 0xD2);
__m128 ge = _mm_cmpge_ps(abcd, cabd);
__m128 mask = _mm_and_ps(gt, ge);
It fails in the case of [ x x x ? ]
(produces [ 0 0 0 ? ]
).
Getting close :-)
Any ideas?
Update:
I'm now using the following solution:
__m128 abcd; // input
__m128 bcad = _mm_shuffle_ps(abcd, abcd, 0xC9);
__m128 gt = _mm_cmpgt_ps(abcd, bcad);
__m128 cabd = _mm_shuffle_ps(abcd, abcd, 0xD2);
__m128 ge = _mm_cmpge_ps(abcd, cabd);
__m128 and = _mm_and_ps(gt, ge);
__m128i bits = _mm_setr_epi32(_mm_movemask_ps(and), -1, -1, -1);
__m128i dirt = _mm_cmpeq_epi32(bits, _mm_setzero_si128());
__m128i mask = _mm_or_si128(dirt, _mm_castps_si128(and));
I have not tested this, but I trust it will get you -1 in only the first (highest-order) occurrence of the maximum value:
__m128 abcd; // input
__m128 ccac = _mm_shuffle_ps( abcd,abcd,0x8A );
__m128 abcd_ccac = _mm_max_ps( abcd,ccac );
__m128 babb = _mm_shuffle_ps( abcd,abcd,0x51 );
__m128 abcd_ccac_babb = _mm_max_ps( abcd_ccac,babb );
__m128 mask = _mm_cmpeq_ps( abcd,abcd_ccac_babb );
// set the '?' position in mask to zero
mask = _mm_blend_ps( mask,_mm_setzero_ps(),0x08 );
// shift mask left 32 bits shifting in zeros
__m128 maskSrl32 = _mm_shuffle_ps( mask,mask,_MM_SHUFFLE( 3,1,0,3 ) );
// shift mask left 64 bits shifting in zeros
__m128 maskSrl64 = _mm_shuffle_ps( mask,mask,_MM_SHUFFLE( 3,0,3,3 ) );
// andnot the shifted masks with mask
// in doing so, the higher order set bits will suppress any set bits which follow
mask = _mm_andnot_ps( maskSrl32,mask );
mask = _mm_andnot_ps( maskSrl64,mask );
// select -1 using the final mask
__m128 result = _mm_and_ps( mask,_mm_set1_ps( -1.0f ) );
Reverse the shifting direction to yield -1 in the lowest-order max position instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With