Does gcc have memory alignment pragma, akin #pragma vector aligned
in Intel compiler?
I would like to tell compiler to optimize particular loop using aligned loads/store instructions. to avoid possible confusion, this is not about struct packing.
e.g:
#if defined (__INTEL_COMPILER)
#pragma vector aligned
#endif
for (int a = 0; a < int(N); ++a) {
q10 += Ix(a,0,0)*Iy(a,1,1)*Iz(a,0,0);
q11 += Ix(a,0,0)*Iy(a,0,1)*Iz(a,1,0);
q12 += Ix(a,0,0)*Iy(a,0,0)*Iz(a,0,1);
q13 += Ix(a,1,0)*Iy(a,0,0)*Iz(a,0,1);
q14 += Ix(a,0,0)*Iy(a,1,0)*Iz(a,0,1);
q15 += Ix(a,0,0)*Iy(a,0,0)*Iz(a,1,1);
}
Thanks
You can tell GCC that a pointer points to aligned memory by using a typedef to create an over-aligned type that you can declare pointers to.
This helps gcc but not clang7.0 or ICC19, see the x86-64 non-AVX asm they emit on Godbolt. (Only GCC folds a load into a memory operand for mulps
, instead of using a separate movups
). You have have to use __builtin_assume_aligned
if you want to portably convey an alignment promise to GNU C compilers other than GCC itself.
From http://gcc.gnu.org/onlinedocs/gcc/Type-Attributes.html
typedef double aligned_double __attribute__((aligned (16)));
// Note: sizeof(aligned_double) is 8, not 16
void some_function(aligned_double *x, aligned_double *y, int n)
{
for (int i = 0; i < n; ++i) {
// math!
}
}
This won't make aligned_double
16 bytes wide. This will just make it aligned to a 16-byte boundary, or rather the first one in an array will be. Looking at the disassembly on my computer, as soon as I use the alignment directive, I start to see a LOT of vector ops. I am using a Power architecture computer at the moment so it's altivec code, but I think this does what you want.
(Note: I wasn't using double
when I tested this, because there altivec doesn't support double floats.)
You can see some other examples of autovectorization using the type attributes here: http://gcc.gnu.org/projects/tree-ssa/vectorization.html
I tried your solution with g++ version 4.5.2 (both Ubuntu and Windows) and it did not vectorize the loop.
If the alignment attribute is removed then it vectorizes the loop, using unaligned loads.
If the function is inlined so that the array can be accessed directly with the pointer eliminated, then it is vectorized with aligned loads.
In both cases, the alignment attribute prevents vectorization. This is ironic: The "aligned_double *x" was supposed to enable vectorization but it does the opposite.
Which compiler was it that reported vectorized loops for you? I suspect it was not a gcc compiler?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With