Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

best cross-platform method to get aligned memory

People also ask

Is aligned memory faster?

Alignment helps the CPU fetch data from memory in an efficient manner: less cache miss/flush, less bus transactions etc. Some memory types (e.g. RDRAM, DRAM etc.) need to be accessed in a structured manner (aligned "words" and in "burst transactions" i.e. many words at one time) in order to yield efficient results.

What is aligned memory access?

An aligned memory access means that the pointer (as an integer) is a multiple of a type-specific value called the alignment. The alignment is the natural address multiple where the type must be, or should be stored (e.g. for performance reasons) on a CPU.

What is byte alignment in memory?

A memory access is said to be aligned when the data being accessed is n bytes long and the datum address is n-byte aligned. When a memory access is not aligned, it is said to be misaligned. Note that by definition byte memory accesses are always aligned.

Why is memory alignment needed?

The CPU can operate on an aligned word of memory atomically, meaning that no other instruction can interrupt that operation. This is critical to the correct operation of many lock-free data structures and other concurrency paradigms.


As long as you're ok with having to call a special function to do the freeing, your approach is okay. I would do your #ifdefs the other way around though: start with the standards-specified options and fall back to platform-specific ones. For example

  1. If __STDC_VERSION__ >= 201112L use aligned_alloc.
  2. If _POSIX_VERSION >= 200112L use posix_memalign.
  3. If _MSC_VER is defined, use the Windows stuff.
  4. ...
  5. If all else fails, just use malloc/free and disable SSE/AVX code.

The problem is harder if you want to be able to pass the allocated pointer to free; that's valid on all the standard interfaces, but not on Windows and not necessarily with the legacy memalign function some unix-like systems have.


The first function you propose would indeed work fine.

Your "homebrew" function also works, but has the drawback that if the value is already aligned, you have just wasted 15 bytes. May not matter sometimes, but the OS may well be able to provide memory that is correctly allocated without any waste (and if it needs to be aligned to 256 or 4096 bytes, you risk wasting a lot of memory by adding "alignment-1" bytes).


Here is a fixed of user2093113's sample, the direct code didn't build for me (void* unknown size). I also put it in a template class overriding operator new/delete so you don't have to do the allocation and call placement new.

#include <memory>

template<std::size_t Alignment>
class Aligned
{
public:
    void* operator new(std::size_t size)
    {
        std::size_t space = size + (Alignment - 1);
        void *ptr = malloc(space + sizeof(void*));
        void *original_ptr = ptr;

        char *ptr_bytes = static_cast<char*>(ptr);
        ptr_bytes += sizeof(void*);
        ptr = static_cast<void*>(ptr_bytes);

        ptr = std::align(Alignment, size, ptr, space);

        ptr_bytes = static_cast<char*>(ptr);
        ptr_bytes -= sizeof(void*);
        std::memcpy(ptr_bytes, &original_ptr, sizeof(void*));

        return ptr;
    }

    void operator delete(void* ptr)
    {
        char *ptr_bytes = static_cast<char*>(ptr);
        ptr_bytes -= sizeof(void*);

        void *original_ptr;
        std::memcpy(&original_ptr, ptr_bytes, sizeof(void*));

        std::free(original_ptr);
    }
};

Use it like this :

class Camera : public Aligned<16>
{
};

Didn't test the cross-platform-ness of this code yet.


If you compiler supports it, C++11 adds a std::align function to do runtime pointer alignment. You could implement your own malloc/free like this (untested):

template<std::size_t Align>
void *aligned_malloc(std::size_t size)
{
    std::size_t space = size + (Align - 1);
    void *ptr = malloc(space + sizeof(void*));
    void *original_ptr = ptr;

    char *ptr_bytes = static_cast<char*>(ptr);
    ptr_bytes += sizeof(void*);
    ptr = static_cast<void*>(ptr_bytes);

    ptr = std::align(Align, size, ptr, space);

    ptr_bytes = static_cast<void*>(ptr);
    ptr_bytes -= sizeof(void*);
    std::memcpy(ptr_bytes, original_ptr, sizeof(void*));

    return ptr;
}

void aligned_free(void* ptr)
{
    void *ptr_bytes = static_cast<void*>(ptr);
    ptr_bytes -= sizeof(void*);

    void *original_ptr;
    std::memcpy(&original_ptr, ptr_bytes, sizeof(void*));

    std::free(original_ptr);
}

Then you don't have to keep the original pointer value around to free it. Whether this is 100% portable I'm not sure, but I hope someone will correct me if not!