Consider this minimal implementation of a fixed vector<int>
:
constexpr std::size_t capacity = 1000;
struct vec
{
int values[capacity];
std::size_t _size = 0;
std::size_t size() const noexcept
{
return _size;
}
void push(int x)
{
values[size()] = x;
++_size;
}
};
Given the following test case:
vec v;
for(std::size_t i{0}; i != capacity; ++i)
{
v.push(i);
}
asm volatile("" : : "g"(&v) : "memory");
The compiler produces non-vectorized assembly: live example on godbolt.org
If I make any of the following changes...
values[size()]
-> values[_size]
Add __attribute__((always_inline))
to size()
...then the compiler then produces vectorized assembly: live example on godbolt.org
Is this a gcc bug? Or is there a reason why a simple accessor such as size()
would prevent auto-vectorization unless always_inline
is explicitly added?
The loop in your example is vectorised for GCC < 7.1, and not vectorized for GCC >= 7.1. So there seems to be some change in behaviour here.
We can look at the compiler optimisation report by adding -fopt-info-vec-all
to the command line:
For GCC 7.3:
<source>:24:29: note: === vect_pattern_recog ===
<source>:24:29: note: === vect_analyze_data_ref_accesses ===
<source>:24:29: note: not vectorized: complicated access pattern.
<source>:24:29: note: bad data access.
<source>:21:5: note: vectorized 0 loops in function.
For GCC 6.3:
<source>:24:29: note: === vect_pattern_recog ===
<source>:24:29: note: === vect_analyze_data_ref_accesses ===
<source>:24:29: note: === vect_mark_stmts_to_be_vectorized ===
[...]
<source>:24:29: note: LOOP VECTORIZED
<source>:21:5: note: vectorized 1 loops in function.
So GCC 7.x decides not to vectorise the loop, because of a complicated access pattern, which might be the (at that point) non-inlined size()
function. Forcing inlining, or doing it manually fixes that. GCC 6.x seems to do that by itself. However, the assembly does look like size()
was eventually inlined in both cases, but maybe only after the vectorisation step in GCC 7.x (this is me guessing).
I wondered why you put the asm volatile(...)
line at the end - probably to prevent the compiler from throwing away the whole loop, because it has no observable effect in this test case. If we just return the last element of v
instead, we can reach the same without causing any possible side-effects on the memory model for v
.
return v.values[capacity - 1];
The code now vectorises with GCC 7.x, as it already did with GCC 6.x:
<source>:24:29: note: === vect_pattern_recog ===
<source>:24:29: note: === vect_analyze_data_ref_accesses ===
<source>:24:29: note: === vect_mark_stmts_to_be_vectorized ===
[...]
<source>:24:29: note: LOOP VECTORIZED
<source>:21:5: note: vectorized 1 loops in function.
So what's the conclusion here?
asm volatile
messes with inlining of size()
preventing vectorisationWhether or not this is a bug - could be either in 6.x or 7.x depending on what behaviour is desired for the asm volatile()
construct - would be a question for the GCC developers.
Also: try adding -mavx2
or -mavx512f -mavx512cd
(or -march=native
etc.) to the command line, depending on your hardware, to get vectorisation beyond 128-bit xmm
, i.e. ymm
and zmm
, registers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With