I'm working on a memory pool for a small game engine.
The main use will be as a segregated storage; a pool contains object of a specific type and size. Currently the pools can be used to store anything, but allocations will be done in blocks of a specific size. Most of the memory need will be allocated at once, but "overgrowth" can be enabled if needed to assist in tuning (almost fixed size).
Problem is, I started to get somewhat paranoid when contemplating about memory alignment. I'm only used to raw memory management on 8 bit processors where everything is byte aligned.
I'm letting the user (me) specify the desired size of the blocks, which in the segregated storage case would be the size of the objects that I'm going to store in it.
The current approach is to allocate a chunk of memory blocks * (desired_size + header_size)
big and place the objects in it, with a header for each block; objects would obviously be positioned directly behind this header.
What do I need to consider with regards to memory alignment in my scenario?
The answer I've come up with so far is that as long as desired_size
represents n-byte aligned data; the header is correctly aligned and packed by the compiler as well as the actual data, everything stored in the block will be n-byte aligned.
n is whatever boundary required by the platform. I'm targeting x86 for the moment but I don't like to make any assumptions about the platform in my code.
Some of the resources I've used:
Edit
Uploaded small sample code which may helpful anybody as confused as me in the future here.
Alignment refers to the arrangement of data in memory, and specifically deals with the issue of accessing data as proper units of information from main memory. First we must conceptualize main memory as a contiguous block of consecutive memory locations. Each location contains a fixed number of bits.
Yes both alignment and arrangement of your data can make a big difference in performance, not just a few percent but few to many hundreds of a percent. Take this loop, two instructions matter if you run enough loops. A performance test you can very easily do yourself.
We care about alignment for a single reason: Performance. In modern hardware, memory can only be accessed on particular boundaries. Trying to read data from an unaligned memory address can result in two reads from main memory plus some logic to combine the data and present it to the user.
A Pool allocator (or simply, a Memory pool) is a variation of the fast Bump-allocator, which in general allows O(1) allocation, when a free block is found right away, without searching a free-list. To achieve this fast allocation, usually a pool allocator uses blocks of a predefined size.
Allocations with malloc
are guaranteed to be aligned for any type provided by the compiler and hence any object[*].
The danger is when your header has a smaller alignment requirement than the maximum alignment requirement for your implementation. Then its size might not be a multiple of the max. alignment, and so when you try to cast/use buf + header_size
as a pointer to something that does have the max. alignment, it's misaligned. As far a C is concerned, that's undefined behaviour. On Intel it works but is slower. On some ARMs it causes a hardware exception. On some ARMs it silently gives the wrong answer. So if you don't want to make assumptions about platform in your code, you must deal with it.
There are basically three tricks that I'm aware of to ensure that your header doesn't cause misalignment:
int
in as padding if necessary to make it an 8-multiple rather than just a 4-multiple".union
of every standard type, together with the struct that you actually want to use. Works well in C, but you'd have problems in C++ if your header isn't valid for membership of unions.Alternatively, you can just define header_size
not to be sizeof(header)
, but to be that size rounded up to a multiple of some chunky power of 2 that's "good enough". If you waste a bit of memory, so be it, and you can always have a "portability header" that defines this kind of thing in a way that isn't purely platform-independent, but makes it easy to adjust to new platforms.
[*] with a common exception being over-sized SIMD types. Since they're non-standard, and it would be wasteful to 16-align every allocation just because of them, they get hand-waved aside, and you need special allocation functions for them.
Your compiler will already align the members of objects and structures that will be stored in the pool. Using the default packing that's appropriate for your architecture, usually 8 for a 32-bit core. You just need to make sure that the address you generate from your pool is aligned accordingly. Which on a 32-bit operating system ought to be a multiple of 8 bytes.
Mis-aligning the objects can be very expensive when they cross a CPU cache line boundary. A double that straddles two cache lines takes as much as three times as long to be read or written.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With