This is not homework, this is purely for my own personal education.
I couldn't figure out how to implement an aligned malloc so looked online and found this website. For the ease of reading I will post the code below:
#include <stdlib.h>
#include <stdio.h>
void* aligned_malloc(size_t required_bytes, size_t alignment)
{
void* p1; // original block
void** p2; // aligned block
int offset = alignment - 1 + sizeof(void*);
if ((p1 = (void*)malloc(required_bytes + offset)) == NULL)
{
return NULL;
}
p2 = (void**)(((size_t)(p1) + offset) & ~(alignment - 1));
p2[-1] = p1;
return p2;
}
void aligned_free(void *p)
{
free(((void**)p)[-1]);
}
void main (int argc, char *argv[])
{
char **endptr;
int *p = aligned_malloc (100, strtol(argv[1], endptr, 10));
printf ("%s: %p\n", argv[1], p);
aligned_free (p);
}
The implementation does work, but I honestly can't figure out how it works.
Here is what I can't understand:
~(alignment - 1)
accomplishp2
is a double pointer. How come we can return it from a function that is supposed to return only a single pointer?Any help is really appreciated.
EDIT
This is not a duplicate of How to allocate aligned memory only using the standard library? because I also need to know how to free aligned memory.
The aligned_alloc function allocates space for an object whose alignment is specified by alignment, whose size is specified by size, and whose value is indeterminate.
malloc has no knowledge of what it is allocating for because its parameter is just total size. It just aligns to an alignment that is safe for any object.
Since many CPUs as well as many operation system do have alignment requirements, most malloc implementation will always return aligned memory but which alignment rules it follows is system specific.
If you really want to implement your own malloc, have a look at the syscalls sbrk() and brk() that will help you modify the size of the heap. @Mike: You keep asking about compilation, but the undefined reference to malloc appears to be a link error. This is a common in embedded programming.
You need an offset if you want to support alignments beyond what your system's malloc()
does. For example if your system malloc()
aligns to 8 byte boundaries, and you want to align to 16 bytes, you ask for 15 bytes extra so you know for sure you can shift the result around to align it as requested. You also add sizeof(void*)
to the size you pass to malloc()
to leave room for bookkeeping.
~(alignment - 1)
is what guarantees the alignment. For example if alignment is 16, then subtract 1 to get 15, aka 0xF, then negating it makes 0xFF..FF0 which is the mask you need to satisfy the alignment for any returned pointer from malloc()
. Note that this trick assumes alignment is a power of 2 (which practically it normally would be, but there really should be a check).
It's a void**
. The function returns void*
. This is OK because a pointer to void is "A pointer to any type," and in this case that type is void*
. In other words, converting void*
to and from other pointer types is allowed, and a double-pointer is still a pointer.
The overall scheme here is to store the original pointer before the one that's returned to the caller. Some implementations of standard malloc()
do the same thing: stash bookkeeping information before the returned block. This makes it easy to know how much space to reclaim when free()
is called.
All that said, this sort of thing is usually not useful, because the standard malloc()
returns the largest alignment on the system. If you need alignment beyond that, there may be other solutions, including compiler-specific attributes.
implementation does work
Perhaps, but I wouldn't be too sure. IMO you'd be better off working from first principles. Right off the bat,
p1 = (void*)malloc
is a red flag. malloc
returns void
. In C, any pointer can be assigned from void *
. Casting from malloc
is usually considered bad form because any effect it has can only be bad.
Why we need an offset
The offset provides room to stash the pointer returned by malloc
, used later by free
.
p1
is retrieved from malloc
. Later, it has to be provide to free
to be released. aligned_malloc
reserves sizeof(void*)
bytes at p1
, stashes p1
there, and returns p2
(the first "aligned" address in the block that p1
points to). Later, when the caller passes p2
to aligned_free
, it converts p2
in effect to void *p2[]
, and fetches the original p1
using -1 as an index.
What does anding with ~(alignment - 1) accomplish
It's what puts p2
on the boundary. Say alignment is 16; alignment -1
is 15, 0xF. ~OxF
is all bits on except the last 4. For any pointer P
, P & ~0xF
will be a multiple of 16.
p2
is a double pointer.
pointer schmointer. malloc
returns void*
. It's a block of memory; you address it as you will. You wouldn't blink at
char **args = calloc(7, sizeof(char*));
to allocate an array of 7 char *
pointers, would you? The code picks some "aligned" location at least sizeof(void*)
bytes from p1
and, for the purposes of free
, treats it as void **
.
What is the general approach
There is no one answer. Best is probably to use a standard (or popular) library. If you build atop malloc
, allocating enough to keep the "real" pointer and returning an aligned one is pretty standard, although I would code it differently. The syscall mmap
returns a page-aligned pointer, which will satisfy most criteria for "aligned". Depending on need, that might be better or worse than piggybacking on malloc
.
Suppose we need SZ bytes of aligned memory, let:
A is the alignment.
W is the CPU word size.
P is the memory returned by malloc.
SZ is the requested number of bytes to be allocated.
we will return (P + Y) in which (P + Y) mod A = 0
So, we should save the original pointer P to be able to free the memory later. In this case, we should allocate (SZ + W) bytes, but to let memory aligned, we will substruct Z bytes in which (P % A = Z) => (Z ∈ [0, A-1])
So the total memory to be allocated is: SZ + W + MAX(Z) = SZ + W + A - 1
The pointer to be returned is P + Y = P + W + MAX(Z) - (P + W + MAX(Z)) mod A
WE HAVE: X - X mod A = INT(X / A) * A = X & ~(A - 1)
SO we can replace P + W + MAX(Z) - (P + W + MAX(Z)) mod A by (P + W + MAX(Z)) & ~(A - 1)
The memory to be returned is: (P + W + MAX(Z)) & ~(A - 1) = (P + W + A - 1) & ~(A - 1)
I have a few issues with this code. I have compiled them into the below list:
p1 = (void*)malloc
You do not cast the return value of malloc.free(((void**)p)[-1]);
You do not cast free.if ((p1 = (void*)malloc(required_bytes + offset)) == NULL)
Do not put an assignment inside the comparison of an if statement. I know a lot of people do this, but in my mind, that is just bad form and makes the code more difficult to read.What they are doing here is storing the original pointer inside the allocated block. That means that only the aligned pointer gets returned to the user. The actual pointer that is returned by malloc, the user never sees. You have to keep that pointer though because free needs it to unlink the block from the allocated list and put it on the free list. At the head of every memory block, malloc puts some housekeeping information there. Things such and next/prev pointers, size, allocation status, etc.... Some debug versions of malloc use guard words to check if something overflowed the buffer. The alignment that is passed to the routine MUST be a power of 2.
When I wrote my own version of malloc for use in a pooled memory allocator, the minimum block size that I used was 8 bytes. So including the header for a 32-bit system, the total was 28 bytes (20 bytes for the header). On a 64-bit system, it was 40 bytes (32 bytes for the header). Most systems have increased performance when data is aligned to some address value (either 4 or 8 bytes on modern computer systems). The reason for this is because the machine can grab the entire word in one bus cycle if it is aligned. If not, then it requires two bus cycles to get the entire word, then it has to construct it. This is why compilers align variables on either 4 or 8 bytes. This means that the last 2 or 3 bits of the address bus are zero.
I know that there are some hardware constraints which requires more alignment than the default 4 or 8. Nvidia's CUDA system, if I remember correctly, requires things aligned to 256 bytes...and that's a hardware requirement.
This has been asked before though. See: How to allocate aligned memory only using the standard library?
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With