Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparison and Extraction using SSE

Tags:

c++

c

simd

sse

What is the best way for pairwise comparison of two integer registers and extraction of equal elements using SSE instructions? For example, if a = [6 4 7 2] and b = [2 4 9 2] (each register contains four 32-bit integers), the result should be [4 2 x x]. An alternative form of this question is how to obtain a binary mask of equal elements (..0101b) that can be used for shuffling or as an index to lookup a parameter for shuffling instruction in the precomputed table.

like image 415
user1128016 Avatar asked Apr 20 '26 10:04

user1128016


2 Answers

It is not possible to extract and move equal elements with one instruction. But a mask of equal elements can easily be achieved with pcmpeqd:

__m128i zero = _mm_set1_epi32(0);
__m128i a = _mm_set_epi32(6, 4, 7, 2);
__m128i b = _mm_set_epi32(2, 4, 9, 2);

__m128i mask = _mm_cmp_epi32(a, b);     // mask is now 0, -1, 0, -1
mask = _mm_sub_epi32(zero, mask);       // mask is now 0,  1, 0,  1

Edit: If you want some index for a lookup table with shuffle constants, you need additional operations. Like

static const __m128i zero = _mm_set1_epi32(0);
static const __m128i bits = _mm_set_epi32(1,2,4,8);

__m128i a = _mm_set_epi32(6, 4, 7, 2);
__m128i b = _mm_set_epi32(2, 4, 9, 2);

__m128i bitvector = _mm_and_si128(bits, _mm_cmp_epi32(a, b));
bitvector = _mm_hadd_epi32(bitvector, bitvector);
bitvector = _mm_hadd_epi32(bitvector, bitvector);
// now a index from 0...15 is the the low 32 bit of bitvector

There might be better algorithms than using a lookup table for computing the shuffle, possibly calculating the shuffle directly using a De Bruijn mulitiplication. OTOH if you have more than 4 ints to compare, additional 4 int's would only come at the cost of one additional phaddd.

like image 157
Gunther Piez Avatar answered Apr 22 '26 22:04

Gunther Piez


I would probably use a variant of what drhirsch proposes:

int index = _mm_movemask_ps((__m128)_mm_cmp_epi32(a, b));

This gives you the same index to use in looking up a shuffle mask using only two operations.

like image 38
Stephen Canon Avatar answered Apr 22 '26 23:04

Stephen Canon



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!