Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tell C++ that pointer data is 16 byte aligned

I wrote some code with static arrays and it vectorizes just fine.

float data[1024] __attribute__((aligned(16)));

I would like to make the arrays dynamically allocated. I tried doing something like this:

float *data = (float*) aligned_alloc(16, size*sizeof(float));

But the compiler (GCC 4.9.2), no longer can vectorize the code. I assume this is because it doesn't know the pointer data is 16 byte aligned. I am getting messages like:

note: Unknown alignment for access: *_43

I have tried adding this line before the data is used, but it doesn't seem to do anything:

data = (float*) __builtin_assume_aligned(data, 16);

Using a different variable and restrict did not help:

float* __restrict__ align_data = (float*) __builtin_assume_aligned(data,16);

Example:

#include <iostream>
#include <stdlib.h>
#include <math.h>

#define SIZE 1024
#define DYNAMIC 0
#define A16 __attribute__((aligned(16)))
#define DA16 (float*) aligned_alloc(16, size*sizeof(float))

class Test{
public:
    int size;
#if DYNAMIC
    float *pos;
    float *vel;
    float *alpha;
    float *k_inv;
    float *osc_sin;
    float *osc_cos;
    float *dosc1;
    float *dosc2;
#else
    float pos[SIZE] A16;
    float vel[SIZE] A16;
    float alpha[SIZE] A16;
    float k_inv[SIZE] A16;
    float osc_sin[SIZE] A16;
    float osc_cos[SIZE] A16;
    float dosc1[SIZE] A16;
    float dosc2[SIZE] A16;
#endif
    Test(int arr_size){
        size = arr_size;
#if DYNAMIC
        pos = DA16;
        vel = DA16;
        alpha = DA16;
        k_inv = DA16;
        osc_sin = DA16;
        osc_cos = DA16;
        dosc1 = DA16;
        dosc2 = DA16;
#endif
    }
    void compute(){
        for (int i=0; i<size; i++){
            float lambda = .67891*k_inv[i],
                omega = (.89 - 2*alpha[i]*lambda)*k_inv[i],
                diff2 = pos[i] - omega,
                diff1 = vel[i] - lambda + alpha[i]*diff2;
            pos[i] = osc_sin[i]*diff1 + osc_cos[i]*diff2 + lambda*.008 + omega;
            vel[i] = dosc1[i]*diff1 - dosc2[i]*diff2 + lambda;
        }
    }
};

int main(int argc, char** argv){
    Test t(SIZE);
    t.compute();
    std::cout << t.pos[10] << std::endl;
    std::cout << t.vel[10] << std::endl;
}

Here is how I am compiling:

g++ -o test test.cpp -O3 -march=native -ffast-math -fopt-info-optimized

When DYNAMIC is set to 0, it outputs:

test.cpp:46:4: note: loop vectorized

but when it is set to 1 it outputs nothing.

like image 498
Azmisov Avatar asked Jun 17 '15 01:06

Azmisov


People also ask

How do you align 16 bytes?

Each byte is 8 bits, so to align on a 16 byte boundary, you need to align to each set of two bytes. Similarly, memory aligned on a 32 bit (4 byte) boundary would have a memory address that's a multiple of four, because you group four bytes together to form a 32 bit word.

Is malloc 16 byte aligned?

The GNU documentation states that malloc is aligned to 16 byte multiples on 64 bit systems.

What is byte alignment C?

General Byte Alignment RulesStructures between 5 and 8 bytes of data should be padded so that the total structure is 8 bytes. Structures between 9 and 16 bytes of data should be padded so that the total structure is 16 bytes. Structures greater than 16 bytes should be padded to 16 byte boundary.

What does 8 byte aligned mean?

An object that is "8 bytes aligned" is stored at a memory address that is a multiple of 8. Many CPUs will only load some data types from aligned locations; on other CPUs such access is just faster. There's also several other possible reasons for using memory alignment - without seeing the code it's hard to say why.


1 Answers

The compiler isn't vectorizing the loop because it can't determine that the dynamically allocated pointers don't alias each other. A simple way to allow your sample code to be vectorized is to pass the --param vect-max-version-for-alias-checks=1000 option. This will allow the compiler to emit all the checks necessary to see if the pointers are actually aliased.

Another simple solution to allow your you example code to be vectorized is to rename main, as suggested by Marc Glisse in his comment. Functions named main apparently have certain optimizations disabled. Named something else, GCC 4.9.2 can track the use of this->foo (and the other pointer members) in compute back to their allocations in Test().

However, I assume something other than your class being used in a function named main prevented your code being vectorized in your real code. A more general solution that allows your code to vectorized without aliasing or alignment checks is to use the restrict keyword and the aligned attribute. Something like this:

typedef float __attribute__((aligned(16))) float_a16;

__attribute__((noinline))
static void _compute(float_a16 * __restrict__ pos,
         float_a16 * __restrict__ vel,
         float_a16 * __restrict__ alpha,
         float_a16 * __restrict__ k_inv,
         float_a16 * __restrict__ osc_sin,
         float_a16 * __restrict__ osc_cos,
         float_a16 * __restrict__ dosc1,
         float_a16 * __restrict__ dosc2,
         int size) {
    for (int i=0; i<size; i++){
        float lambda = .67891*k_inv[i],
            omega = (.89 - 2*alpha[i]*lambda)*k_inv[i],
            diff2 = pos[i] - omega,
            diff1 = vel[i] - lambda + alpha[i]*diff2;
        pos[i] = osc_sin[i]*diff1 + osc_cos[i]*diff2 + lambda*.008 + omega;
        vel[i] = dosc1[i]*diff1 - dosc2[i]*diff2 + lambda;
    }
}

void compute() {
    _compute(pos, vel, alpha, k_inv, osc_sin, osc_cos, dosc1, dosc2,
         size);
}

The noinline attribute is critical, otherwise inlining can cause the pointers to lose their restrictedness and alignedness. The compiler seems to ignore the restrict keyword in contexts other than function parameters.

like image 179
Ross Ridge Avatar answered Sep 17 '22 13:09

Ross Ridge