I have the following bottleneck function.
typedef unsigned char byte;
void CompareArrays(const byte * p1Start, const byte * p1End, const byte * p2, byte * p3)
{
const byte b1 = 128-30;
const byte b2 = 128+30;
for (const byte * p1 = p1Start; p1 != p1End; ++p1, ++p2, ++p3) {
*p3 = (*p1 < *p2 ) ? b1 : b2;
}
}
I want to replace C++
code with SSE2 intinsic functions. I have tried _mm_cmpgt_epi8
but it used signed compare. I need unsigned compare.
Is there any trick (SSE, SSE2, SSSE3) to solve my problem?
Note: I do not want to use multi-threading in this case.
Instead of offsetting your signed values to make them unsigned, a slightly more efficient way would be to do the following:
_mm_min_epu8
to get the unsigned min of p1, p2_mm_cmpeq_epi8
_mm_or_si128
and _mm_andc_si128
to select the appropriate b1/b2 valuesNote that this is 4 instructions in total, compared with 5 using the offset + signed comparison approach.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With