I'm trying to use the SSE instructions to do some image filtering. The image I'm using has a byte per pixel (255 greyscale) and I need to compare the unsigned packed bytes using a greather than comparison. I've looked into the intel's manual and the comparison exists but just for signed bytes (PCMPGTB). How could I make this comparison for the unsigned bytes? Thanks in advance
It is indeed not possible to make unsigned comparisons directly, until AVX-5121.
But you can add -128 to each value (or subtract 128, or XOR 0x80, or similar). That'll turn 0 into -128, 255 into 127, and other values into values in between; the result being that you get the correct results from the comparison.
Expanding it to words should work too, but sounds a fair bit slower, since you're getting half the work done per instruction.
_mm_cmpgt_epu8(a, b) = _mm_cmpgt_epi8(
_mm_xor_epi8(a, _mm_set1_epi8(-128)), // range-shift to unsigned
_mm_xor_epi8(b, _mm_set1_epi8(-128)))
pxor
can run on more execution ports than paddb
on some CPUs, so it's normally the best option if you need to do this. XOR is add-without-carry, and the carry-out from adding or subtracting 0x80 goes out the top of each byte element.
Footnote 1: With AVX-512BW:
vpcmpub
which takes a comparison predicate as an immediate, like cmpps
. _mm_cmp[eq|ge|gt|le|lt|neq]_epu8_mask
compares into a mask instead of into another vector, because that's how AVX-512 compare instructions work. e.g.__mmask16 _mm_cmpgt_epu8_mask (__m128i a, __m128i b)
in Intel's intrinsics guide
The unsigned comparison (a >= b) is identical to maxu( a, b ) == a, so you can use
_mm_cmpeq_epi8( a, _mm_max_epu8(a,b)) --> a >= b "cmpge_epu8(a,b)"
If you need a <
or >
comparison, you need to invert the result, at which point Alcaro's approach may be as good (though that method needs a register to carry a constant for the inversion). But for a >=
or <=
comparison this is definitely better (since there's no _mm_cmple_epi8 or _mm_cmpge_epi8 to use, even after converting unsigned to signed range).
Proposing a small, but important enhancement to @greggo’s solution: The
maxu( a, b ) == a
has a drawback as you have to backup “a” before the maxu comparison, resulting in a supplementary operation, something like that:
movq mmc, mma
pmaxu mma, mmb
pcmpeq mma, mmc
The
minu( a, b ) == b
gives exactly the same effect but preserves the operators for the equality check:
pminu mma, mmb
pcmpeq mma, mmb
The gain is significant: just 2 operations instead of 3.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With