Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best way to load 2 unaligned 64-bit values into an sse register with SSSE3?

There are 2 pointers to 2 unaligned 8 byte chunks to be loaded into an xmm register. If possible, using intrinsics. And if possible, without using an auxiliary register. Without pinsrd. (SSSE Core 2)

like image 207
alecco Avatar asked Aug 27 '11 23:08

alecco


2 Answers

From the msvc specs, it looks like you can do the following:

__m128d xx;                   // an uninitialised xmm register 
xx = _mm_loadh_pd(xx, ptra);  // load the higher 64 bits from (unaligned) ptra
xx = _mm_loadl_pd(xx, ptrb);  // load the lower 64 bits from (unaligned) ptrb

Loading from unaligned storage (in my experience) is very much slower than loading from aligned pointers, so you properly wouldn't want to be doing this type of operation too often - if you really want higher performance.

Hope this helps.

like image 67
Darren Engwirda Avatar answered Sep 27 '22 21:09

Darren Engwirda


Unaligned access is so much slower than aligned access (at least pre-Nehalem ); you may get better speed by loading the aligned 128 bit words that contain the desired unaligned 64 bit words, then shuffle them to make the result you want.

Assumes:

  • you have memory read access to the full 128 word
  • the 64 bit words are aligned on at least 32 bit boundaries

e.g. (not tested)

int aoff = ptra & 15;
int boff = ptrb & 15;
__m128 va = _mm_load_ps( (char*)ptra - aoff ); 
__m128 vb = _mm_load_ps( (char*)ptrb - boff ); 

switch ( (aoff<<4) | boff ) 
{
    case 0:  _mm_shuffle_ps(va,vb, ...

The number of cases depends on whether you can assume 64 bit alignment

like image 26
Mark Borgerding Avatar answered Sep 27 '22 21:09

Mark Borgerding