Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

_declspec(align(16)) does not align the pointer to 16 bytes

Tags:

c++

sse

So I am trying to use the SSE function __mm_load_128, I am very new to SSE fo forgive me if I have made some silly mistakes somewhere.

Here is the code

void one(__m128i *arr, char *temp)
{
    // SSE needs 16 byte alignment.
    _declspec (align(16)) __m128i *tmp = (__m128i*) temp;

    if (((uintptr_t)tmp & 15) == 0)
        printf("Aligned pointer");
    else 
        printf("%d", ((uintptr_t)tmp & 15)); // This prints as 12

    arr[0] = _mm_load_si128(tmp);
}

I get an error on visual studio 0xC0000005: Access violation reading location 0xFFFFFFFF.

0xFFFFFFFF does not look right, what am I doing wrong.

arr argument is initialized as _m128i arr[5] = { 0 }

Alternative would be to use _mm_loadu_128 which works fine but as I understand it, It should produce movdqu instruction but this is the assembly generated

    arr[0] = _mm_loadu_si128(tmp);
00D347F1  mov         eax,dword ptr [tmp]  
00D347F4  movups      xmm0,xmmword ptr [eax]  
00D347F7  movaps      xmmword ptr [ebp-100h],xmm0  
00D347FE  mov         ecx,10h  
00D34803  imul        edx,ecx,0  
00D34806  add         edx,dword ptr [arr]  
00D34809  movups      xmm0,xmmword ptr [ebp-100h]  
00D34810  movups      xmmword ptr [edx],xmm0 

Thanks guys, From the answers I realize I have made couple of mistakes.

  1. Align the source use _alinged_malloc

  2. Compile with optimizations.

  3. Use C++ casts not C

like image 409
qwn Avatar asked Jan 29 '23 11:01

qwn


1 Answers

I can see three problems here:

  1. this code is strictly C, and not C++. it's not a problem per se, but the question is tagged as C++.
  2. You don't distinguish between the alignment of the pointer vs the alignment of what the pointer points to
  3. when the code flow is inside one, it's impossible to change the alignment of arr or temp.

Let's focus on point number 2 for a second - there's a pointer, and there's what the pointer points to. I guess you already know the difference between these two.

basically , when you write _declspec (align(16)) __m128i *tmp you tell the program:

When you allocate the pointer tmp on the stack, make sure the the first byte of tmp is allocated on an address (on the stack) which is dividable by 16.

So great, tmp itself is aligned to 16, it doesn't affect at all what tmp points to. you need temp to point to align data already. this can be done by

  1. allocate the data on the stack with proper alignas keyword (alignas(16) char my_buffer[16*100];)
  2. allocate dynamic data with memory allocating functions capable of allocating align data, like aligned_alloc, or MSVC's _aligned_malloc which requires _aligned_free. See How to solve the 32-byte-alignment issue for AVX load/store operations?

You cannot align memory retroactively, it has to be allocated aligned in the first place. make sure the data passed by temp is already aligned, or use unaligned loads/stores if you can't require callers to pass aligned data.

like image 199
David Haim Avatar answered Feb 05 '23 17:02

David Haim