Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SSE Compare Packed Unsigned Bytes

I'm trying to use the SSE instructions to do some image filtering. The image I'm using has a byte per pixel (255 greyscale) and I need to compare the unsigned packed bytes using a greather than comparison. I've looked into the intel's manual and the comparison exists but just for signed bytes (PCMPGTB). How could I make this comparison for the unsigned bytes? Thanks in advance

like image 854
Lautaro Avatar asked Apr 25 '13 00:04

Lautaro


3 Answers

It is indeed not possible to make unsigned comparisons directly, until AVX-5121.

But you can add -128 to each value (or subtract 128, or XOR 0x80, or similar). That'll turn 0 into -128, 255 into 127, and other values into values in between; the result being that you get the correct results from the comparison.

Expanding it to words should work too, but sounds a fair bit slower, since you're getting half the work done per instruction.

_mm_cmpgt_epu8(a, b) = _mm_cmpgt_epi8(
        _mm_xor_epi8(a, _mm_set1_epi8(-128)),  // range-shift to unsigned
        _mm_xor_epi8(b, _mm_set1_epi8(-128)))

pxor can run on more execution ports than paddb on some CPUs, so it's normally the best option if you need to do this. XOR is add-without-carry, and the carry-out from adding or subtracting 0x80 goes out the top of each byte element.


Footnote 1: With AVX-512BW:

vpcmpub which takes a comparison predicate as an immediate, like cmpps. _mm_cmp[eq|ge|gt|le|lt|neq]_epu8_mask compares into a mask instead of into another vector, because that's how AVX-512 compare instructions work. e.g.
__mmask16 _mm_cmpgt_epu8_mask (__m128i a, __m128i b) in Intel's intrinsics guide

like image 72
Alcaro Avatar answered Nov 12 '22 11:11

Alcaro


The unsigned comparison (a >= b) is identical to maxu( a, b ) == a, so you can use

_mm_cmpeq_epi8( a, _mm_max_epu8(a,b))   -->   a >= b  "cmpge_epu8(a,b)"

If you need a < or > comparison, you need to invert the result, at which point Alcaro's approach may be as good (though that method needs a register to carry a constant for the inversion). But for a >= or <= comparison this is definitely better (since there's no _mm_cmple_epi8 or _mm_cmpge_epi8 to use, even after converting unsigned to signed range).

like image 5
greggo Avatar answered Nov 12 '22 11:11

greggo


Proposing a small, but important enhancement to @greggo’s solution: The

maxu( a, b ) == a

has a drawback as you have to backup “a” before the maxu comparison, resulting in a supplementary operation, something like that:

movq mmc, mma
pmaxu mma, mmb
pcmpeq mma, mmc

The

minu( a, b ) == b

gives exactly the same effect but preserves the operators for the equality check:

pminu mma, mmb
pcmpeq mma, mmb

The gain is significant: just 2 operations instead of 3.

like image 2
Zoltán Bíró Avatar answered Nov 12 '22 12:11

Zoltán Bíró