Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NEON vs Intel SSE - equivalence of certain operations

Tags:

c++

c

simd

sse

neon

I'm having some trouble figuring out the NEON equivalence of a couple of Intel SSE operations. It seems that NEON is not capable to handle an entire Q register at once(128 bit value data type). I haven't found anything in the arm_neon.h header or in the NEON intrinsics reference.

What I want to do is the following:

// Intel SSE
// shift the entire 128 bit value with 2 bytes to the right; this is done 
// without sign extension by shifting in zeros
__m128i val = _mm_srli_si128(vector_of_8_s16, 2);
// insert the least significant 16 bits of "some_16_bit_val"
// the whole thing in this case, into the selected 16 bit 
// integer of vector "val"(the 16 bit element with index 7 in this case)
val = _mm_insert_epi16(val, some_16_bit_val, 7);

I've looked at the shifting operations provided by NEON but could not find an equivalent way of doing the above(I don't have much experience with NEON). Is it possible to do the above(I guess it is I just don't know how)? Any pointers greatly appreciated.

like image 441
celavek Avatar asked Aug 26 '11 10:08

celavek


1 Answers

You want the VEXT instruction. Your example would look something like:

int16x8_t val = vextq_s16(vector_of_8_s16, another_vector_s16, 1);

After this, bits 0-111 of val will contain bits 16-127 of vector_of_8_s16, and bits 112-127 of val will contain bits 0-15 of another_vector_s16.

like image 125
Stephen Canon Avatar answered Nov 03 '22 01:11

Stephen Canon