Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why won't GCC auto-vectorize this loop?

I have the following C program (a simplification of my actual use case which exhibits the same behavior)

#include <stdlib.h>
#include <math.h>
int main(int argc, char ** argv) {
    const float * __restrict__ const input = malloc(20000*sizeof(float));
    float * __restrict__ const output = malloc(20000*sizeof(float));

    unsigned int pos=0;
    while(1) {
            unsigned int rest=100;
            for(unsigned int i=pos;i<pos+rest; i++) {
                    output[i] = input[i] * 0.1;
            }

            pos+=rest;            
            if(pos>10000) {
                    break;
            }
    }
}

When I compile with

 -O3 -g -Wall -ftree-vectorizer-verbose=5 -msse -msse2 -msse3 -march=native -mtune=native --std=c99 -fPIC -ffast-math

I get the output

main.c:10: note: not vectorized: unhandled data-ref 

where 10 is the line of the inner for loop. When I looked up why it might say this, it seemed to be saying that the pointers could be aliased, but they can't be in my code, as I have the __restrict keyword. They also suggested including the -msse flags, but they don't seem to do anything either. Any help?

like image 475
Jeremy Salwen Avatar asked Feb 16 '11 22:02

Jeremy Salwen


1 Answers

It certainly seems like a bug. In the following, equivalent functions, foo() is vectorised but bar() is not, when compiling for an x86-64 target:

void foo(const float * restrict input, float * restrict output)
{
    unsigned int pos;
    for (pos = 0; pos < 10100; pos++)
        output[pos] = input[pos] * 0.1;
}

void bar(const float * restrict input, float * restrict output)
{
    unsigned int pos;
    unsigned int i;
    for (pos = 0; pos <= 10000; pos += 100)
        for (i = 0; i < 100; i++)
            output[pos + i] = input[pos + i] * 0.1;
}

Adding the -m32 flag, to compile for an x86 target instead, causes both functions to be vectorised.

like image 128
caf Avatar answered Oct 01 '22 03:10

caf