When the OS loads a process into memory it initializes the stack pointer to the virtual address it has decided where the stack should go in the process's virtual address space and program code uses this register to know where stack variables are. My question is how does malloc() know at what virtual address the heap starts at? Does the heap always exist at the end of the data segment, if so how does malloc() know where that is? Or is it even one contiguous area of memory or just randomly interspersed with other global variables in the data section?
In C, the library function malloc is used to allocate a block of memory on the heap. The program accesses this block of memory via a pointer that malloc returns. When the memory is no longer needed, the pointer is passed to free which deallocates the memory so that it can be used for other purposes.
malloc uses an extra 8 bytes right in front of the pointer it returns to "remember" this size information. When you free said pointer, free will know both the address and read 8 bytes before the pointer to get the size info, then happily release the memory to the operating system.
Data StorageAll variables allocated by malloc (or new in C++) is stored in heap memory. When malloc is called, the pointer that returns from malloc will always be a pointer to “heap memory”.
Normally, malloc() allocates memory from the heap, and adjusts the size of the heap as required, using sbrk(2). When allocating blocks of memory larger than MMAP_THRESHOLD bytes, the glibc malloc() implementation allocates the memory as a private anonymous mapping using mmap(2).
malloc
implementations are dependent on the operating system; so is the process that they use to get the beginning of the heap. On UNIX, this can be accomplished by calling sbrk(0)
at initialization time. On other operating systems the process is different.
Note that you can implement malloc
without knowing the location of the heap. You can initialize the free list to NULL
, and call sbrk
or a similar function with the allocation size each time a free element of the appropriate size is not found.
This only about Linux implementations of malloc
Many malloc
implementations on Linux or Posix use the mmap(2) syscall to get some quite big range of memory. then they can use munmap(2) to release it.
(It looks like sbrk(2) might not be used a lot any more; in particular, it is not ASLR friendly and might not be multi-thread friendly)
Both these syscalls may be quite expansive, so some implementations ask memory (using mmap
) in quite large chunks (e.g. in chunk of one or a few megabytes). Then they manage free space as e.g. linked lists of blocks, etc. They will handle differently small mallocs and large mallocs.
The mmap
syscall usually does not start giving memory range at some fixed pieces (notably because of ASLR.
Try on your system to run a simple program printing the result of a single malloc
(of e.g. 128 int
-s). You probably will observe different addresses from one run to the next (because of ASLR). And strace(1)-ing it is very instructive. Try also cat /proc/self/maps
(or print the lines of /proc/self/maps
inside your program). See proc(5)
So there is no need to "start" the heap at some address, and on many systems that does not make even any sense. The kernel is giving segments of virtual addresses at random pages.
BTW, both GNU libc and musl libc are free software. You should look inside the source code of their malloc
implementation. I find that source code of musl libc is very readable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With