The following question is related, however answers are old, and comment from user Marc Glisse suggests there are new approaches since C++17 to this problem that might not be adequately discussed.
I'm trying to get aligned memory working properly for SIMD, while still having access to all of the data.
On Intel, if I create a float vector of type __m256
, and reduce my size by a factor of 8, it gives me aligned memory.
E.g. std::vector<__m256> mvec_a((N*M)/8);
In a slightly hacky way, I can cast pointers to vector elements to float, which allows me to access individual float values.
Instead, I would prefer to have an std::vector<float>
which is correctly aligned, and thus can be loaded into __m256
and other SIMD types without segfaulting.
I've been looking into aligned_alloc.
This can give me a C-style array that is correctly aligned:
auto align_sz = static_cast<std::size_t> (32);
float* marr_a = (float*)aligned_alloc(align_sz, N*M*sizeof(float));
However I'm unsure how to do this for std::vector<float>
. Giving the std::vector<float>
ownership of marr_a
doesn't seem to be possible.
I've seen some suggestions that I should write a custom allocator, but this seems like a lot of work, and perhaps with modern C++ there is a better way?
The standard library containers take an allocator template argument which can be used to align their internal buffers. The specified allocator type has to implement at least allocate
, deallocate
, and value_type
.
In contrast to these answers, this implementation of such an allocator avoids platform-dependent aligned malloc calls. Instead, it uses the C++17 aligned new
operator.
Here is the full example on godbolt.
#include <limits>
#include <new>
/**
* Returns aligned pointers when allocations are requested. Default alignment
* is 64B = 512b, sufficient for AVX-512 and most cache line sizes.
*
* @tparam ALIGNMENT_IN_BYTES Must be a positive power of 2.
*/
template<typename ElementType,
std::size_t ALIGNMENT_IN_BYTES = 64>
class AlignedAllocator
{
private:
static_assert(
ALIGNMENT_IN_BYTES >= alignof( ElementType ),
"Beware that types like int have minimum alignment requirements "
"or access will result in crashes."
);
public:
using value_type = ElementType;
static std::align_val_t constexpr ALIGNMENT{ ALIGNMENT_IN_BYTES };
/**
* This is only necessary because AlignedAllocator has a second template
* argument for the alignment that will make the default
* std::allocator_traits implementation fail during compilation.
* @see https://stackoverflow.com/a/48062758/2191065
*/
template<class OtherElementType>
struct rebind
{
using other = AlignedAllocator<OtherElementType, ALIGNMENT_IN_BYTES>;
};
public:
constexpr AlignedAllocator() noexcept = default;
constexpr AlignedAllocator( const AlignedAllocator& ) noexcept = default;
template<typename U>
constexpr AlignedAllocator( AlignedAllocator<U, ALIGNMENT_IN_BYTES> const& ) noexcept
{}
[[nodiscard]] ElementType*
allocate( std::size_t nElementsToAllocate )
{
if ( nElementsToAllocate
> std::numeric_limits<std::size_t>::max() / sizeof( ElementType ) ) {
throw std::bad_array_new_length();
}
auto const nBytesToAllocate = nElementsToAllocate * sizeof( ElementType );
return reinterpret_cast<ElementType*>(
::operator new[]( nBytesToAllocate, ALIGNMENT ) );
}
void
deallocate( ElementType* allocatedPointer,
[[maybe_unused]] std::size_t nBytesAllocated )
{
/* According to the C++20 draft n4868 § 17.6.3.3, the delete operator
* must be called with the same alignment argument as the new expression.
* The size argument can be omitted but if present must also be equal to
* the one used in new. */
::operator delete[]( allocatedPointer, ALIGNMENT );
}
};
This allocator can then be used like this:
#include <iostream>
#include <stdexcept>
#include <vector>
template<typename T, std::size_t ALIGNMENT_IN_BYTES = 64>
using AlignedVector = std::vector<T, AlignedAllocator<T, ALIGNMENT_IN_BYTES> >;
int
main()
{
AlignedVector<int, 1024> buffer( 3333 );
if ( reinterpret_cast<std::uintptr_t>( buffer.data() ) % 1024 != 0 ) {
std::cerr << "Vector buffer is not aligned!\n";
throw std::logic_error( "Faulty implementation!" );
}
std::cout << "Successfully allocated an aligned std::vector.\n";
return 0;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With