Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

gcc memory alignment pragma

Does gcc have memory alignment pragma, akin #pragma vector aligned in Intel compiler? I would like to tell compiler to optimize particular loop using aligned loads/store instructions. to avoid possible confusion, this is not about struct packing.

e.g:

#if defined (__INTEL_COMPILER)
#pragma vector aligned
#endif
        for (int a = 0; a < int(N); ++a) {
            q10 += Ix(a,0,0)*Iy(a,1,1)*Iz(a,0,0);
            q11 += Ix(a,0,0)*Iy(a,0,1)*Iz(a,1,0);
            q12 += Ix(a,0,0)*Iy(a,0,0)*Iz(a,0,1);
            q13 += Ix(a,1,0)*Iy(a,0,0)*Iz(a,0,1);
            q14 += Ix(a,0,0)*Iy(a,1,0)*Iz(a,0,1);
            q15 += Ix(a,0,0)*Iy(a,0,0)*Iz(a,1,1);
        }

Thanks

like image 312
Anycorn Avatar asked Apr 21 '10 23:04

Anycorn


2 Answers

You can tell GCC that a pointer points to aligned memory by using a typedef to create an over-aligned type that you can declare pointers to.

This helps gcc but not clang7.0 or ICC19, see the x86-64 non-AVX asm they emit on Godbolt. (Only GCC folds a load into a memory operand for mulps, instead of using a separate movups). You have have to use __builtin_assume_aligned if you want to portably convey an alignment promise to GNU C compilers other than GCC itself.


From http://gcc.gnu.org/onlinedocs/gcc/Type-Attributes.html

typedef double aligned_double __attribute__((aligned (16)));
// Note: sizeof(aligned_double) is 8, not 16
void some_function(aligned_double *x, aligned_double *y, int n)
{
    for (int i = 0; i < n; ++i) {
        // math!
    }
}

This won't make aligned_double 16 bytes wide. This will just make it aligned to a 16-byte boundary, or rather the first one in an array will be. Looking at the disassembly on my computer, as soon as I use the alignment directive, I start to see a LOT of vector ops. I am using a Power architecture computer at the moment so it's altivec code, but I think this does what you want.

(Note: I wasn't using double when I tested this, because there altivec doesn't support double floats.)

You can see some other examples of autovectorization using the type attributes here: http://gcc.gnu.org/projects/tree-ssa/vectorization.html

like image 130
Dietrich Epp Avatar answered Sep 24 '22 08:09

Dietrich Epp


I tried your solution with g++ version 4.5.2 (both Ubuntu and Windows) and it did not vectorize the loop.

If the alignment attribute is removed then it vectorizes the loop, using unaligned loads.

If the function is inlined so that the array can be accessed directly with the pointer eliminated, then it is vectorized with aligned loads.

In both cases, the alignment attribute prevents vectorization. This is ironic: The "aligned_double *x" was supposed to enable vectorization but it does the opposite.

Which compiler was it that reported vectorized loops for you? I suspect it was not a gcc compiler?

like image 38
A Fog Avatar answered Sep 22 '22 08:09

A Fog