I just finished a test as part of a job interview, and one question stumped me, even using Google for reference. I'd like to see what the StackOverflow crew can do with it:
The
memset_16aligned
function requires a 16-byte aligned pointer passed to it, or it will crash.a) How would you allocate 1024 bytes of memory, and align it to a 16 byte boundary?
b) Free the memory after thememset_16aligned
has executed.
{ void *mem; void *ptr; // answer a) here memset_16aligned(ptr, 0, 1024); // answer b) here }
3.6 Allocating Aligned Memory Blocks. The address of a block returned by malloc or realloc in GNU systems is always a multiple of eight (or sixteen on 64-bit systems). If you need a block whose address is a multiple of a higher power of two than that, use aligned_alloc or posix_memalign .
Alignment refers to the arrangement of data in memory, and specifically deals with the issue of accessing data as proper units of information from main memory. First we must conceptualize main memory as a contiguous block of consecutive memory locations. Each location contains a fixed number of bits.
Regular malloc aligns memory suitable for any object type (which, in practice, means that it is aligned to alignof(max_align_t)). This function is useful for over-aligned allocations, such as to SSE, cache line, or VM page boundary.
Alignment helps the CPU fetch data from memory in an efficient manner: less cache miss/flush, less bus transactions etc. Some memory types (e.g. RDRAM, DRAM etc.) need to be accessed in a structured manner (aligned "words" and in "burst transactions" i.e. many words at one time) in order to yield efficient results.
Someone else mentioned posix_memalign () as another way to get the aligned memory; that isn't available everywhere, but could often be implemented using this as a basis. Note that it was convenient that the alignment was a power of 2; other alignments are messier. One more comment — this code does not check that the allocation succeeded.
Since the minimal memory alignment of any allocator is 1 byte, 15=max (0,16-1) is a conservative answer. However, if you know your memory allocator is going to give you 32-bit int aligned addresses (which is fairly common), you could have used 12 as a pad.
Since malloc (or another dynamic memory allocator) is not necessarily guaranteed to align memory as we require, we’ll need to perform two extra steps: Request extra bytes and store the offset between our original pointer and our aligned pointer
In practice, aligned allocators often take a parameter for the alignment rather than it being hardwired. So the user will pass in the size of the struct they care about (or the least power of 2 greater than or equal to that) and all will be well.
{ void *mem = malloc(1024+16); void *ptr = ((char *)mem+16) & ~ 0x0F; memset_16aligned(ptr, 0, 1024); free(mem); }
{ void *mem = malloc(1024+15); void *ptr = ((uintptr_t)mem+15) & ~ (uintptr_t)0x0F; memset_16aligned(ptr, 0, 1024); free(mem); }
The first step is to allocate enough spare space, just in case. Since the memory must be 16-byte aligned (meaning that the leading byte address needs to be a multiple of 16), adding 16 extra bytes guarantees that we have enough space. Somewhere in the first 16 bytes, there is a 16-byte aligned pointer. (Note that malloc()
is supposed to return a pointer that is sufficiently well aligned for any purpose. However, the meaning of 'any' is primarily for things like basic types — long
, double
, long double
, long long
, and pointers to objects and pointers to functions. When you are doing more specialized things, like playing with graphics systems, they can need more stringent alignment than the rest of the system — hence questions and answers like this.)
The next step is to convert the void pointer to a char pointer; GCC notwithstanding, you are not supposed to do pointer arithmetic on void pointers (and GCC has warning options to tell you when you abuse it). Then add 16 to the start pointer. Suppose malloc()
returned you an impossibly badly aligned pointer: 0x800001. Adding the 16 gives 0x800011. Now I want to round down to the 16-byte boundary — so I want to reset the last 4 bits to 0. 0x0F has the last 4 bits set to one; therefore, ~0x0F
has all bits set to one except the last four. Anding that with 0x800011 gives 0x800010. You can iterate over the other offsets and see that the same arithmetic works.
The last step, free()
, is easy: you always, and only, return to free()
a value that one of malloc()
, calloc()
or realloc()
returned to you — anything else is a disaster. You correctly provided mem
to hold that value — thank you. The free releases it.
Finally, if you know about the internals of your system's malloc
package, you could guess that it might well return 16-byte aligned data (or it might be 8-byte aligned). If it was 16-byte aligned, then you'd not need to dink with the values. However, this is dodgy and non-portable — other malloc
packages have different minimum alignments, and therefore assuming one thing when it does something different would lead to core dumps. Within broad limits, this solution is portable.
Someone else mentioned posix_memalign()
as another way to get the aligned memory; that isn't available everywhere, but could often be implemented using this as a basis. Note that it was convenient that the alignment was a power of 2; other alignments are messier.
One more comment — this code does not check that the allocation succeeded.
Windows Programmer pointed out that you can't do bit mask operations on pointers, and, indeed, GCC (3.4.6 and 4.3.1 tested) does complain like that. So, an amended version of the basic code — converted into a main program, follows. I've also taken the liberty of adding just 15 instead of 16, as has been pointed out. I'm using uintptr_t
since C99 has been around long enough to be accessible on most platforms. If it wasn't for the use of PRIXPTR
in the printf()
statements, it would be sufficient to #include <stdint.h>
instead of using #include <inttypes.h>
. [This code includes the fix pointed out by C.R., which was reiterating a point first made by Bill K a number of years ago, which I managed to overlook until now.]
#include <assert.h> #include <inttypes.h> #include <stdio.h> #include <stdlib.h> #include <string.h> static void memset_16aligned(void *space, char byte, size_t nbytes) { assert((nbytes & 0x0F) == 0); assert(((uintptr_t)space & 0x0F) == 0); memset(space, byte, nbytes); // Not a custom implementation of memset() } int main(void) { void *mem = malloc(1024+15); void *ptr = (void *)(((uintptr_t)mem+15) & ~ (uintptr_t)0x0F); printf("0x%08" PRIXPTR ", 0x%08" PRIXPTR "\n", (uintptr_t)mem, (uintptr_t)ptr); memset_16aligned(ptr, 0, 1024); free(mem); return(0); }
And here is a marginally more generalized version, which will work for sizes which are a power of 2:
#include <assert.h> #include <inttypes.h> #include <stdio.h> #include <stdlib.h> #include <string.h> static void memset_16aligned(void *space, char byte, size_t nbytes) { assert((nbytes & 0x0F) == 0); assert(((uintptr_t)space & 0x0F) == 0); memset(space, byte, nbytes); // Not a custom implementation of memset() } static void test_mask(size_t align) { uintptr_t mask = ~(uintptr_t)(align - 1); void *mem = malloc(1024+align-1); void *ptr = (void *)(((uintptr_t)mem+align-1) & mask); assert((align & (align - 1)) == 0); printf("0x%08" PRIXPTR ", 0x%08" PRIXPTR "\n", (uintptr_t)mem, (uintptr_t)ptr); memset_16aligned(ptr, 0, 1024); free(mem); } int main(void) { test_mask(16); test_mask(32); test_mask(64); test_mask(128); return(0); }
To convert test_mask()
into a general purpose allocation function, the single return value from the allocator would have to encode the release address, as several people have indicated in their answers.
Uri commented: Maybe I am having [a] reading comprehension problem this morning, but if the interview question specifically says: "How would you allocate 1024 bytes of memory" and you clearly allocate more than that. Wouldn't that be an automatic failure from the interviewer?
My response won't fit into a 300-character comment...
It depends, I suppose. I think most people (including me) took the question to mean "How would you allocate a space in which 1024 bytes of data can be stored, and where the base address is a multiple of 16 bytes". If the interviewer really meant how can you allocate 1024 bytes (only) and have it 16-byte aligned, then the options are more limited.
However, if the interviewer expected either of those responses, I'd expect them to recognize that this solution answers a closely related question, and then to reframe their question to point the conversation in the correct direction. (Further, if the interviewer got really stroppy, then I wouldn't want the job; if the answer to an insufficiently precise requirement is shot down in flames without correction, then the interviewer is not someone for whom it is safe to work.)
The title of the question has changed recently. It was Solve the memory alignment in C interview question that stumped me. The revised title (How to allocate aligned memory only using the standard library?) demands a slightly revised answer — this addendum provides it.
C11 (ISO/IEC 9899:2011) added function aligned_alloc()
:
7.22.3.1 The
aligned_alloc
functionSynopsis
#include <stdlib.h> void *aligned_alloc(size_t alignment, size_t size);
Description
Thealigned_alloc
function allocates space for an object whose alignment is specified byalignment
, whose size is specified bysize
, and whose value is indeterminate. The value ofalignment
shall be a valid alignment supported by the implementation and the value ofsize
shall be an integral multiple ofalignment
.Returns
Thealigned_alloc
function returns either a null pointer or a pointer to the allocated space.
And POSIX defines posix_memalign()
:
#include <stdlib.h> int posix_memalign(void **memptr, size_t alignment, size_t size);
DESCRIPTION
The
posix_memalign()
function shall allocatesize
bytes aligned on a boundary specified byalignment
, and shall return a pointer to the allocated memory inmemptr
. The value ofalignment
shall be a power of two multiple ofsizeof(void *)
.Upon successful completion, the value pointed to by
memptr
shall be a multiple ofalignment
.If the size of the space requested is 0, the behavior is implementation-defined; the value returned in
memptr
shall be either a null pointer or a unique pointer.The
free()
function shall deallocate memory that has previously been allocated byposix_memalign()
.RETURN VALUE
Upon successful completion,
posix_memalign()
shall return zero; otherwise, an error number shall be returned to indicate the error.
Either or both of these could be used to answer the question now, but only the POSIX function was an option when the question was originally answered.
Behind the scenes, the new aligned memory function do much the same job as outlined in the question, except they have the ability to force the alignment more easily, and keep track of the start of the aligned memory internally so that the code doesn't have to deal with specially — it just frees the memory returned by the allocation function that was used.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With