I had a question about the new aligned option in OpenMP. This is in the context of using it with #pragma omp simd aligned(a:n)
Say I have an array of integers which I allocated using posix_memalign so I know that the array starts on lets say a 32 byte boundary. Now lets say I want to square every value in that array. Can I say...
int* array = { some array of length len aligned to 32 bytes };
#pragma omp simd aligned(array:32)
for(int i = 0; i < len; i++)
array[i] *= array[i];
Is this a safe assumption? Or does aligned also imply that the size data type I am using (int) in the array is a multiple of 32 bytes? Kind of like how the attribute((aligned(32)) in gcc will make the type of width at least 32 bytes.
To make sure we understand each other, let's assume your array
is indeed 256bit aligned (which is equivalent to your 32 bytes alignment).
Then, yes, your #pragma omp simd aligned(array:32)
is safe, irrespective of the length of the array or the size of the type of the array. The only thing that matters is the address pointed at by the "pointer" used to reference the array.
EDIT: I realised that my answer, although correct, was a bit dry since it was just me answering, but without any "official" support for it. So here are some excerpts of the standard to sustain my answer:
From the OpenMP 4.0 standard §2.8.1:
[C/C++: The aligned clause declares that the object to which each list item points is aligned to the number of bytes expressed in the optional parameter of the aligned clause.]
The optional parameter of the aligned clause, alignment, must be a constant positive integer expression. If no optional parameter is specified, implementation-defined default alignments for SIMD instructions on the target platforms are assumed.
[...]
[C: The type of list items appearing in the aligned clause must be array or pointer.]
[C++: The type of list items appearing in the aligned clause must be array, pointer, reference to array, or reference to pointer.]
As you can see, there are no assumptions on the type of the data pointed or referenced by the variable used inside the aligned
clause. The only assumption is that the address of the memory segment pointed is byte-aligned to the optional parameter, or to some "implementation-defined default alignments" (which BTW strongly encourages me to always give this optional parameter since I have no idea what this implementation-defined default value might be, and more to the point, whether I'll be sure that my array is indeed aligned this way).
aligned(ptr:n)
tells the compiler that the array behind ptr
begins from an address aligned to n
bytes. This helps the compiler in deciding on how to optimally vectorise the loop. Since many vector units require that vector loads and stores are aligned, if the compiler cannot infer the alignment of the data at compile time, it has to generate runtime code that checks the alignment and eventually performs the unaligned portions of the loop (both at the beginning and at the end of the iteration space) using scalar instructions. Those checks are time consuming, especially given smaller array lengths. If the proper alignment is known at compile time, the compiler can directly emit the needed scalar operations. With AVX-512 (Intel Xeon Phi) unaligned loads and stores are performed using masking and providing the correct alignment allows the compiler to directly emit the masked instructions as needed instead of computing the masks at run time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With