According to this question I thought that in C++17 a std::vector with default allocator should handle over aligned types. However, the following code
#include <iostream>
#include <iterator>
#include <array>
#include <vector>
template<typename T, size_t N, size_t Alignment>
struct alignas(Alignment) AlignedArray : public std::array<T, N>
{
friend std::ostream& operator<<(std::ostream& o, const AlignedArray& a)
{
std::copy(a.cbegin(), a.cend(), std::ostream_iterator<T>(o, " "));
return o;
}
};
int main()
{
using Array = AlignedArray<double, 24, 64>;
std::vector<Array> v(10);
for(const auto& e : v)
{
auto arr(e);
std::cout << arr << std::endl;
}
return 0;
}
segfaults on the creation of arr
when I compile it with clang 6.0.1 and -mavx
. Without the -mavx
switch it runs fine (CPU is a E5-2697 v2). I compiled it with
clang++ -I<path_to_libcxx>/include/c++/v1 -g -mavx -std=c++17 main.cpp -stdlib=libc++ -lc++abi -o alignastest -L<path_to_libcxx>/lib -L<path_to_libcxxabi>/lib
.
I am running this on an old RHEL 6.9 where I compiled clang 6.0.1 and libcxx, libcxxabi.
I tested it on another system (Ubuntu 18.10, gcc 8) and it works without any problems.
Regarding the alignment, I found out that the implementation of std::aligned_alloc
in libc++ relies on a C11 feature which is only enabled with a recent glibc version (__config.h
):
#if __GLIBC_PREREQ(2, 17)
#define _LIBCPP_HAS_C11_FEATURES
#endif
Unfortunately RHEL 6.9 has only ldd (GNU libc) 2.12
installed. Is alignas
also depending on the glibc version?
I have found the problem with the compiled code, however, I have not found a solution yet. But it seams, that this is only a clang issue and using g++ fixes it.
The problem is best illustrated by showing some of the resulting assembly code. The auto arr(e);
code line gets compiled to some move instructions to copy the data from the vector to the stack, clang uses (when compiling with -mavx) avx2 instructions like the following (AT&T syntax):
vmovaps 0xa0(%rax),%ymm0
vmovaps %ymm0,0x120(%rsp)
...
Where %rax is the address of the current array in the vector. The target arr is located at 0x80(%rsp). The program would copy over in 32byte chunks (256 bit avx2 instructions).
However the problem becomes clear, when looking at the values: %rax = 0x55555556be70
in my debugging test. the problem is, that vmovaps (move aligned packed single precision) to a 256bit avx2 register expects, that the target and source are aligned at 256bit or 32byte (0x20) boundaries, however %rax is only 16byte aligned. When compiling without the alignas, clang uses vmovups (the same instruction, but does not require the data to be aligned).
So the issue is, that the allocator of std::vector does not respect the alignas and does not align the array at 64byte boundaries. g++ does also not align the array inside the vector to 32byte boundaries and does not use avx instructions when not also using -O[not 0]. However g++ always uses the 128bit xmm registers, which only need alignment to 16bytes, to which the allocator alignes the data with both compilers.
EDIT:
I just realized, that I forgot to compile with -std=c++17. with that flag it works for me with clang++. The code looks the same, but the allocator correctly aligns the code at a 64byte boundary. So I guess it has to do with an old library. Maybe you can send me your binary, then I could take a more detailed look at it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With