Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Modern approach to making std::vector allocate aligned memory

The following question is related, however answers are old, and comment from user Marc Glisse suggests there are new approaches since C++17 to this problem that might not be adequately discussed.

I'm trying to get aligned memory working properly for SIMD, while still having access to all of the data.

On Intel, if I create a float vector of type __m256, and reduce my size by a factor of 8, it gives me aligned memory.

E.g. std::vector<__m256> mvec_a((N*M)/8);

In a slightly hacky way, I can cast pointers to vector elements to float, which allows me to access individual float values.

Instead, I would prefer to have an std::vector<float> which is correctly aligned, and thus can be loaded into __m256 and other SIMD types without segfaulting.

I've been looking into aligned_alloc.

This can give me a C-style array that is correctly aligned:

auto align_sz = static_cast<std::size_t> (32);
float* marr_a = (float*)aligned_alloc(align_sz, N*M*sizeof(float));

However I'm unsure how to do this for std::vector<float>. Giving the std::vector<float> ownership of marr_a doesn't seem to be possible.

I've seen some suggestions that I should write a custom allocator, but this seems like a lot of work, and perhaps with modern C++ there is a better way?

like image 356
Prunus Persica Avatar asked Sep 13 '25 14:09

Prunus Persica


1 Answers

The standard library containers take an allocator template argument which can be used to align their internal buffers. The specified allocator type has to implement at least allocate, deallocate, and value_type.

In contrast to these answers, this implementation of such an allocator avoids platform-dependent aligned malloc calls. Instead, it uses the C++17 aligned new operator.

Here is the full example on godbolt.

#include <limits>
#include <new>

/**
 * Returns aligned pointers when allocations are requested. Default alignment
 * is 64B = 512b, sufficient for AVX-512 and most cache line sizes.
 *
 * @tparam ALIGNMENT_IN_BYTES Must be a positive power of 2.
 */
template<typename    ElementType,
         std::size_t ALIGNMENT_IN_BYTES = 64>
class AlignedAllocator
{
private:
    static_assert(
        ALIGNMENT_IN_BYTES >= alignof( ElementType ),
        "Beware that types like int have minimum alignment requirements "
        "or access will result in crashes."
    );

public:
    using value_type = ElementType;
    static std::align_val_t constexpr ALIGNMENT{ ALIGNMENT_IN_BYTES };

    /**
     * This is only necessary because AlignedAllocator has a second template
     * argument for the alignment that will make the default
     * std::allocator_traits implementation fail during compilation.
     * @see https://stackoverflow.com/a/48062758/2191065
     */
    template<class OtherElementType>
    struct rebind
    {
        using other = AlignedAllocator<OtherElementType, ALIGNMENT_IN_BYTES>;
    };

public:
    constexpr AlignedAllocator() noexcept = default;

    constexpr AlignedAllocator( const AlignedAllocator& ) noexcept = default;

    template<typename U>
    constexpr AlignedAllocator( AlignedAllocator<U, ALIGNMENT_IN_BYTES> const& ) noexcept
    {}

    [[nodiscard]] ElementType*
    allocate( std::size_t nElementsToAllocate )
    {
        if ( nElementsToAllocate
             > std::numeric_limits<std::size_t>::max() / sizeof( ElementType ) ) {
            throw std::bad_array_new_length();
        }

        auto const nBytesToAllocate = nElementsToAllocate * sizeof( ElementType );
        return reinterpret_cast<ElementType*>(
            ::operator new[]( nBytesToAllocate, ALIGNMENT ) );
    }

    void
    deallocate(                  ElementType* allocatedPointer,
                [[maybe_unused]] std::size_t  nBytesAllocated )
    {
        /* According to the C++20 draft n4868 § 17.6.3.3, the delete operator
         * must be called with the same alignment argument as the new expression.
         * The size argument can be omitted but if present must also be equal to
         * the one used in new. */
        ::operator delete[]( allocatedPointer, ALIGNMENT );
    }
};

This allocator can then be used like this:

#include <iostream>
#include <stdexcept>
#include <vector>

template<typename T, std::size_t ALIGNMENT_IN_BYTES = 64>
using AlignedVector = std::vector<T, AlignedAllocator<T, ALIGNMENT_IN_BYTES> >;

int
main()
{
    AlignedVector<int, 1024> buffer( 3333 );
    if ( reinterpret_cast<std::uintptr_t>( buffer.data() ) % 1024 != 0 ) {
        std::cerr << "Vector buffer is not aligned!\n";
        throw std::logic_error( "Faulty implementation!" );
    }

    std::cout << "Successfully allocated an aligned std::vector.\n";
    return 0;
}
like image 140
mxmlnkn Avatar answered Sep 15 '25 04:09

mxmlnkn