Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load two sets of 4 shorts into an XMM register?

I'm just getting started with SSE intrinsics using Visual C++ 2012 and I need some pointers (no pun intended).

I have two arrays containing 4 signed shorts each (each array is thus 64-bit, totalling 128). I want to load one into the upper bits of an XMM register, and the other in the lower bits. Can I accomplish this efficiently using SSE intrinsics? If so, how?

like image 892
Asik Avatar asked Apr 26 '13 00:04

Asik


1 Answers

SSE2:

short A[] = {0,1,2,3};
short B[] = {4,5,6,7};

__m128i a,b,v;
a = _mm_loadl_epi64((const __m128i*)A);
b = _mm_loadl_epi64((const __m128i*)B);
v = _mm_unpacklo_epi64(a,b);

// v = {0,1,2,3,4,5,6,7}

SSE4.1 + x64:

short A[] = {0,1,2,3};
short B[] = {4,5,6,7};

__m128i v;
v = _mm_loadl_epi64((const __m128i*)A);
v = _mm_insert_epi64(v,*(const long long*)B,1);

// v = {0,1,2,3,4,5,6,7}

Note that there are no alignment requirements for A or B. But I'd recommend that they both be aligned to 8 bytes anyway.

like image 182
Mysticial Avatar answered Oct 20 '22 17:10

Mysticial