So I am trying to use the SSE function __mm_load_128
, I am very new to SSE fo forgive me if I have made some silly mistakes somewhere.
Here is the code
void one(__m128i *arr, char *temp)
{
// SSE needs 16 byte alignment.
_declspec (align(16)) __m128i *tmp = (__m128i*) temp;
if (((uintptr_t)tmp & 15) == 0)
printf("Aligned pointer");
else
printf("%d", ((uintptr_t)tmp & 15)); // This prints as 12
arr[0] = _mm_load_si128(tmp);
}
I get an error on visual studio
0xC0000005: Access violation reading location 0xFFFFFFFF.
0xFFFFFFFF
does not look right, what am I doing wrong.
arr
argument is initialized as _m128i arr[5] = { 0 }
Alternative would be to use _mm_loadu_128
which works fine but as I understand it, It should produce movdqu
instruction but this is the assembly generated
arr[0] = _mm_loadu_si128(tmp);
00D347F1 mov eax,dword ptr [tmp]
00D347F4 movups xmm0,xmmword ptr [eax]
00D347F7 movaps xmmword ptr [ebp-100h],xmm0
00D347FE mov ecx,10h
00D34803 imul edx,ecx,0
00D34806 add edx,dword ptr [arr]
00D34809 movups xmm0,xmmword ptr [ebp-100h]
00D34810 movups xmmword ptr [edx],xmm0
Thanks guys, From the answers I realize I have made couple of mistakes.
Align the source use _alinged_malloc
Compile with optimizations.
Use C++ casts not C
I can see three problems here:
one
, it's impossible to change the alignment of arr
or temp
.Let's focus on point number 2 for a second - there's a pointer, and there's what the pointer points to. I guess you already know the difference between these two.
basically , when you write _declspec (align(16)) __m128i *tmp
you tell the program:
When you allocate the pointer
tmp
on the stack, make sure the the first byte oftmp
is allocated on an address (on the stack) which is dividable by 16.
So great, tmp
itself is aligned to 16, it doesn't affect at all what tmp points to. you need temp
to point to align data already. this can be done by
alignas
keyword (alignas(16) char my_buffer[16*100];
)aligned_alloc
, or MSVC's _aligned_malloc
which requires _aligned_free
. See How to solve the 32-byte-alignment issue for AVX load/store operations?
You cannot align memory retroactively, it has to be allocated aligned in the first place. make sure the data passed by temp
is already aligned, or use unaligned loads/stores if you can't require callers to pass aligned data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With