I am trying to implement SSE vectorization on a piece of code for which I need my 1D array to be 16 byte memory aligned. However, I have tried several ways to allocate 16byte memory aligned data but it ends up being 4byte memory aligned.
I have to work with the Intel icc compiler. This is a sample code I am testing with:
#include <stdio.h>
#include <stdlib.h>
void error(char *str)
{
printf("Error:%s\n",str);
exit(-1);
}
int main()
{
int i;
//float *A=NULL;
float *A = (float*) memalign(16,20*sizeof(float));
//align
// if (posix_memalign((void **)&A, 16, 20*sizeof(void*)) != 0)
// error("Cannot align");
for(i = 0; i < 20; i++)
printf("&A[%d] = %p\n",i,&A[i]);
free(A);
return 0;
}
This is the output I get:
&A[0] = 0x11fe010
&A[1] = 0x11fe014
&A[2] = 0x11fe018
&A[3] = 0x11fe01c
&A[4] = 0x11fe020
&A[5] = 0x11fe024
&A[6] = 0x11fe028
&A[7] = 0x11fe02c
&A[8] = 0x11fe030
&A[9] = 0x11fe034
&A[10] = 0x11fe038
&A[11] = 0x11fe03c
&A[12] = 0x11fe040
&A[13] = 0x11fe044
&A[14] = 0x11fe048
&A[15] = 0x11fe04c
&A[16] = 0x11fe050
&A[17] = 0x11fe054
&A[18] = 0x11fe058
&A[19] = 0x11fe05c
It is 4byte aligned everytime, i have used both memalign, posix memalign. Since I am working on Linux, I cannot use _mm_malloc neither can I use _aligned_malloc. I get a memory corruption error when I try to use _aligned_attribute (which is suitable for gcc alone I think).
Can anyone assist me in accurately generating 16byte memory aligned data for icc on linux platform.
The aligned_alloc function allocates space for an object whose alignment is specified by alignment, whose size is specified by size, and whose value is indeterminate.
This effectively means that the address of the memory your data resides in needs to be divisible by the number of bytes required by the instruction. So in your case the alignment is 16 bytes (128 bits), which means the memory address of your data needs to be a multiple of 16.
The GNU documentation states that malloc is aligned to 16 byte multiples on 64 bit systems.
Since many CPUs as well as many operation system do have alignment requirements, most malloc implementation will always return aligned memory but which alignment rules it follows is system specific.
The memory you allocate is 16-byte aligned. See:&A[0] = 0x11fe010
But in an array of float
, each element is 4 bytes, so the second is 4-byte aligned.
You can use an array of structures, each containing a single float, with the aligned
attribute:
struct x {
float y;
} __attribute__((aligned(16)));
struct x *A = memalign(...);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With