What's the best way to load 2 unaligned 64-bit values into an sse register with SSSE3?

Question

There are 2 pointers to 2 unaligned 8 byte chunks to be loaded into an xmm register. If possible, using intrinsics. And if possible, without using an auxiliary register. Without pinsrd. (SSSE Core 2)

Darren Engwirda · Accepted Answer

From the msvc specs, it looks like you can do the following:

__m128d xx;                   // an uninitialised xmm register 
xx = _mm_loadh_pd(xx, ptra);  // load the higher 64 bits from (unaligned) ptra
xx = _mm_loadl_pd(xx, ptrb);  // load the lower 64 bits from (unaligned) ptrb

Loading from unaligned storage (in my experience) is very much slower than loading from aligned pointers, so you properly wouldn't want to be doing this type of operation too often - if you really want higher performance.

Hope this helps.

Mark Borgerding · Answer

Unaligned access is so much slower than aligned access (at least pre-Nehalem ); you may get better speed by loading the aligned 128 bit words that contain the desired unaligned 64 bit words, then shuffle them to make the result you want.

Assumes:

you have memory read access to the full 128 word
the 64 bit words are aligned on at least 32 bit boundaries

e.g. (not tested)

int aoff = ptra & 15;
int boff = ptrb & 15;
__m128 va = _mm_load_ps( (char*)ptra - aoff ); 
__m128 vb = _mm_load_ps( (char*)ptrb - boff ); 

switch ( (aoff<<4) | boff ) 
{
    case 0:  _mm_shuffle_ps(va,vb, ...

The number of cases depends on whether you can assume 64 bit alignment

What's the best way to load 2 unaligned 64-bit values into an sse register with SSSE3?

Tags:

simd

sse

intrinsics

alecco

2 Answers

Darren Engwirda

Mark Borgerding

Recent Activity

Donate For Us

What's the best way to load 2 unaligned 64-bit values into an sse register with SSSE3?

Tags:

simd

sse

intrinsics

alecco

2 Answers

Darren Engwirda

Mark Borgerding

Related questions

Recent Activity

Donate For Us