Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple getter/accessor prevents vectorization - gcc bug?

Consider this minimal implementation of a fixed vector<int>:

constexpr std::size_t capacity = 1000;

struct vec 
{
    int values[capacity];
    std::size_t _size = 0;    

    std::size_t size() const noexcept 
    { 
        return _size; 
    }

    void push(int x) 
    {
        values[size()] = x;
        ++_size;
    }
};

Given the following test case:

vec v;
for(std::size_t i{0}; i != capacity; ++i) 
{
    v.push(i);
}

asm volatile("" : : "g"(&v) : "memory");

The compiler produces non-vectorized assembly: live example on godbolt.org

godbolt screenshot - non vectorized


If I make any of the following changes...

  • values[size()] -> values[_size]

  • Add __attribute__((always_inline)) to size()

...then the compiler then produces vectorized assembly: live example on godbolt.org

godbolt screenshot - vectorized


Is this a gcc bug? Or is there a reason why a simple accessor such as size() would prevent auto-vectorization unless always_inline is explicitly added?

like image 584
Vittorio Romeo Avatar asked Feb 13 '18 13:02

Vittorio Romeo


1 Answers

The loop in your example is vectorised for GCC < 7.1, and not vectorized for GCC >= 7.1. So there seems to be some change in behaviour here.

We can look at the compiler optimisation report by adding -fopt-info-vec-all to the command line:

For GCC 7.3:

<source>:24:29: note: === vect_pattern_recog ===
<source>:24:29: note: === vect_analyze_data_ref_accesses ===
<source>:24:29: note: not vectorized: complicated access pattern.
<source>:24:29: note: bad data access.
<source>:21:5: note: vectorized 0 loops in function.

For GCC 6.3:

<source>:24:29: note: === vect_pattern_recog ===
<source>:24:29: note: === vect_analyze_data_ref_accesses ===
<source>:24:29: note: === vect_mark_stmts_to_be_vectorized ===
[...]
<source>:24:29: note: LOOP VECTORIZED
<source>:21:5: note: vectorized 1 loops in function.

So GCC 7.x decides not to vectorise the loop, because of a complicated access pattern, which might be the (at that point) non-inlined size() function. Forcing inlining, or doing it manually fixes that. GCC 6.x seems to do that by itself. However, the assembly does look like size() was eventually inlined in both cases, but maybe only after the vectorisation step in GCC 7.x (this is me guessing).

I wondered why you put the asm volatile(...) line at the end - probably to prevent the compiler from throwing away the whole loop, because it has no observable effect in this test case. If we just return the last element of v instead, we can reach the same without causing any possible side-effects on the memory model for v.

return v.values[capacity - 1];

The code now vectorises with GCC 7.x, as it already did with GCC 6.x:

<source>:24:29: note: === vect_pattern_recog ===
<source>:24:29: note: === vect_analyze_data_ref_accesses ===
<source>:24:29: note: === vect_mark_stmts_to_be_vectorized ===
[...]
<source>:24:29: note: LOOP VECTORIZED
<source>:21:5: note: vectorized 1 loops in function.

So what's the conclusion here?

  • something changed with GCC 7.1
  • best guess: a side-effect of the asm volatile messes with inlining of size() preventing vectorisation

Whether or not this is a bug - could be either in 6.x or 7.x depending on what behaviour is desired for the asm volatile() construct - would be a question for the GCC developers.

Also: try adding -mavx2 or -mavx512f -mavx512cd (or -march=native etc.) to the command line, depending on your hardware, to get vectorisation beyond 128-bit xmm, i.e. ymm and zmm, registers.

like image 155
noma Avatar answered Oct 20 '22 03:10

noma