GCC fails to optimize aligned std::array like C array

Tags:

Here's some code which GCC 6 and 7 fail to optimize when using std::array:

#include <array>

static constexpr size_t my_elements = 8;

class Foo
{
public:
#ifdef C_ARRAY
    typedef double Vec[my_elements] alignas(32);
#else
    typedef std::array<double, my_elements> Vec alignas(32);
#endif
    void fun1(const Vec&);
    Vec v1{{}};
};

void Foo::fun1(const Vec& __restrict__ v2)
{
    for (unsigned i = 0; i < my_elements; ++i)
    {
        v1[i] += v2[i];
    }
}

Compiling the above with g++ -std=c++14 -O3 -march=haswell -S -DC_ARRAY produces nice code:

    vmovapd ymm0, YMMWORD PTR [rdi]
    vaddpd  ymm0, ymm0, YMMWORD PTR [rsi]
    vmovapd YMMWORD PTR [rdi], ymm0
    vmovapd ymm0, YMMWORD PTR [rdi+32]
    vaddpd  ymm0, ymm0, YMMWORD PTR [rsi+32]
    vmovapd YMMWORD PTR [rdi+32], ymm0
    vzeroupper

That's basically two unrolled iterations of adding four doubles at a time via 256-bit registers. But if you compile without -DC_ARRAY, you get a huge mess starting with this:

    mov     rax, rdi
    shr     rax, 3
    neg     rax
    and     eax, 3
    je      .L7

The code generated in this case (using std::array instead of a plain C array) seems to check for alignment of the input array--even though it is specified in the typedef as aligned to 32 bytes.

It seems that GCC doesn't understand that the contents of an std::array are aligned the same as the std::array itself. This breaks the assumption that using std::array instead of C arrays does not incur a runtime cost.

Is there something simple I'm missing that would fix this? So far I came up with an ugly hack:

void Foo::fun2(const Vec& __restrict__ v2)
{
    typedef double V2 alignas(Foo::Vec);
    const V2* v2a = static_cast<const V2*>(&v2[0]);

    for (unsigned i = 0; i < my_elements; ++i)
    {
        v1[i] += v2a[i];
    }
}

Also note: if my_elements is 4 instead of 8, the problem does not occur. If you use Clang, the problem does not occur.

You can see it live here: https://godbolt.org/g/IXIOst

927

asked Apr 27 '17 08:04

John Zwinck

1 Answers

Interestingly, if you replace v1[i] += v2a[i]; with v1._M_elems[i] += v2._M_elems[i]; (which is obviously not portable), gcc manages to optimize the std::array case as well as the case of the C array.

Possible interpretation: in the gcc dumps (-fdump-tree-all-all), one can see MEM[(struct FooD.25826 *)this_7(D) clique 1 base 0].v1D.25832[i_15] in the C array case, and MEM[(const value_typeD.25834 &)v2_7(D) clique 1 base 1][_1] for std::array. That is, in the second case, gcc may have forgotten that this is part of type Foo and only remembers that it is accessing a double.

This is an abstraction penalty that comes from all the inline functions one has to go through to finally see the array access. Clang still manages to vectorize nicely (even after removing alignas!). This likely means that clang vectorizes without caring about alignment, and indeed it uses instructions like vmovupd that do not require an aligned address.

The hack you found, casting to Vec, is another way to let the compiler see, when it handles the memory access, that the type being handled is aligned. For a regular std::array::operator[], the memory access happens inside a member function of std::array, which doesn't have any clue that *this happens to be aligned.

Gcc also has a builtin to let the compiler know about alignment:

const double*v2a=static_cast<const double*>(__builtin_assume_aligned(v2.data(),32));

110

answered Oct 08 '22 13:10

Marc Glisse

Related questions
                            
                                Non-GPL library that can connect to a MySQL database?
                            
                                Position based dynamics example (Matthias Müller) [closed]
                            
                                I've done a shady thing
                            
                                How to implement a minimum heap sort to find the kth smallest element?
                            
                                std::bind not working
                            
                                Loading native COM DLLs in 64bit environment
                            
                                Derived class type in template argument doesn't compile
                            
                                Combining two regular expression c++0x
                            
                                How use std::multiset with multiple comparator function?
                            
                                Why is my child window unresponsive to mouse events?
                            
                                Default constructors C++
                            
                                Java/C/C++/C#/PHP to Pascal converter? [closed]
                            
                                Is a "2D fft" the same as two 1D fft's?
                            
                                Changing python math module behaviour for non-positive numbers division
                            
                                Automatically Specify .LIB in Header for Visual Studio 2008 C++
                            
                                how to make camera follow a 3d object in opengl?
                            
                                Remove a flag from C++FLAGS in Makefile?
                            
                                Does "&" vs. "&&" actually make a difference for compile-time flags?
                            
                                Properties of a pointer to a zero length array
                            
                                Why did Microsoft abandon long double data type? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

GCC fails to optimize aligned std::array like C array

Tags:

c++

optimization

gcc

simd

memory-alignment

John Zwinck

People also ask

1 Answers

Marc Glisse

Recent Activity

Donate For Us