Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is dynamically allocated memory always 16 bytes aligned?

I wrote a simple example:

#include <iostream>

int main() {
    void* byte1 = ::operator new(1);
    void* byte2 = ::operator new(1);
    void* byte3 = malloc(1);
    std::cout << "byte1: " << byte1 << std::endl;
    std::cout << "byte2: " << byte2 << std::endl;
    std::cout << "byte3: " << byte3 << std::endl;
    return 0;
}

Running the example, I get the following results:

byte1: 0x1f53e70

byte2: 0x1f53e90

byte3: 0x1f53eb0

Each time I allocate a single byte of memory, it's always 16 bytes aligned. Why does this happen?

I tested this code on GCC 5.4.0 as well as GCC 7.4.0, and got the same results.

like image 906
jinge Avatar asked Nov 29 '19 02:11

jinge


People also ask

Is malloc 16 byte aligned?

The GNU documentation states that malloc is aligned to 16 byte multiples on 64 bit systems.

What is byte alignment in memory?

A memory access is said to be aligned when the data being accessed is n bytes long and the datum address is n-byte aligned. When a memory access is not aligned, it is said to be misaligned. Note that by definition byte memory accesses are always aligned.

Why is malloc aligned?

malloc has no knowledge of what it is allocating for because its parameter is just total size. It just aligns to an alignment that is safe for any object.

How dynamic memory is allocated?

In C, dynamic memory is allocated from the heap using some standard library functions. The two key dynamic memory functions are malloc() and free(). The malloc() function takes a single parameter, which is the size of the requested memory area in bytes. It returns a pointer to the allocated memory.


3 Answers

Why does this happen?

Because the standard says so. More specifically, it says that the dynamic allocations1 are aligned to at least the maximum fundamental2 alignment (it may have stricter alignment). There is a pre-defined macro (since C++17) just for the purpose of telling you exactly what this guaranteed alignment is: __STDCPP_DEFAULT_NEW_ALIGNMENT__. Why this might be 16 in your example... that is a choice of the language implementation, restricted by what is allowed by the target hardware architecture.

This is (was) a necessary design, considering that there is (was) no way to pass information about the needed alignment to the allocation function (until C++17 which introduced aligned-new syntax for the purpose of allocating "over-aligned" memory).

malloc doesn't know anything about the types of objects that you intend to create into the memory. One might think that new could in theory deduce the alignment since it is given a type... but what if you wanted to reuse that memory for other objects with stricter alignment, like for example in implementation of std::vector? And once you know the API of the operator new: void* operator new ( std::size_t count ), you can see that the type or its alignment are not an argument that could affect the alignment of the allocation.

1 Made by the default allocator, or malloc family of functions.

2 The maximum fundamental alignment is alignof(std::max_align_t). No fundamental type (arithmetic types, pointers) has stricter alignment than this.

like image 170
eerorika Avatar answered Nov 11 '22 20:11

eerorika


There are actually two reasons. The first reason is, that there are some alignment requirements for some kinds of objects. Usually, these alignment requirements are soft: A misaligned access is "just" slower (possibly by orders of magnitude). They can also be hard: On the PPC, for instance, you simply could not access a vector in memory if that vector was not aligned to 16 bytes. Alignment is not something optional, it is something that must be considered when allocating memory. Always.

Note that there is no way to specify an alignment to malloc(). There's simply no argument for it. As such, malloc() must be implemented to provide a pointer that is correctly aligned for any purposes on the platform. The ::operator new() in C++ follows the same principle.

How much alignment is needed is fully platform dependent. On a PPC, there is no way that you can get away with less than 16 bytes alignment. X86 is a bit more lenient in this, afaik.


The second reason is the inner workings of an allocator function. Typical implementations have an allocator overhead of at least 2 pointers: Whenever you request a byte from malloc() it will usually need to allocate space for at least two additional pointers to do its own bookkeeping (the exact amount depends on the implementation). On a 64 bit architecture, that's 16 bytes. As such, it is not sensible for malloc() to think in terms of bytes, it's more efficient to think in terms of 16 byte blocks. At least. You see that with your example code: The resulting pointers are actually 32 bytes apart. Each memory block occupies 16 bytes payload + 16 bytes internal bookkeeping memory.

Since the allocators request entire memory pages from the kernel (4096 bytes, 4096 bytes aligned!), the resulting memory blocks are naturally 16 bytes aligned on a 64 bit platform. It's simply not practical to provide less aligned memory allocations.


So, taken these two reasons together, it is both practical and required to provide seriously aligned memory blocks from an allocator function. The exact amount of alignment depends on the platform, but will usually not be less than the size of two pointers.

like image 23
cmaster - reinstate monica Avatar answered Nov 11 '22 20:11

cmaster - reinstate monica


It's probably the way the memory allocator manages to get the necessary information to the deallocation function: the issue of the deallocation function (like free or the general, global operator delete) is that there is exactly one argument, the pointer to the allocated memory and no indication of the size of the block that was requested (or the size that was allocated if it's larger), so that indication (and much more) needs to be provided in some other form to the deallocation function.

The most simple yet efficient approach is to allocate room for that additional information plus the requested bytes, and return a pointer to the end of the information block, let's call it IB. The size and alignment of IB automatically aligns the address returned by either malloc or operator new, even if you allocate a minuscule amount: the real amount allocated by malloc(s) is sizeof(IB)+s.

For such small allocations the approach is relatively wasteful and other strategies might be used, but having multiple allocation methods complicate deallocation as the function must first determine which method was used.

like image 42
curiousguy Avatar answered Nov 11 '22 20:11

curiousguy