Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does malloc() know where the heap starts?

When the OS loads a process into memory it initializes the stack pointer to the virtual address it has decided where the stack should go in the process's virtual address space and program code uses this register to know where stack variables are. My question is how does malloc() know at what virtual address the heap starts at? Does the heap always exist at the end of the data segment, if so how does malloc() know where that is? Or is it even one contiguous area of memory or just randomly interspersed with other global variables in the data section?

like image 545
mclaassen Avatar asked Sep 11 '14 17:09

mclaassen


People also ask

How does malloc access the heap?

In C, the library function malloc is used to allocate a block of memory on the heap. The program accesses this block of memory via a pointer that malloc returns. When the memory is no longer needed, the pointer is passed to free which deallocates the memory so that it can be used for other purposes.

How does malloc keep track of memory?

malloc uses an extra 8 bytes right in front of the pointer it returns to "remember" this size information. When you free said pointer, free will know both the address and read 8 bytes before the pointer to get the size info, then happily release the memory to the operating system.

Does malloc always allocate on the heap?

Data StorageAll variables allocated by malloc (or new in C++) is stored in heap memory. When malloc is called, the pointer that returns from malloc will always be a pointer to “heap memory”.

Where does malloc get memory from?

Normally, malloc() allocates memory from the heap, and adjusts the size of the heap as required, using sbrk(2). When allocating blocks of memory larger than MMAP_THRESHOLD bytes, the glibc malloc() implementation allocates the memory as a private anonymous mapping using mmap(2).


2 Answers

malloc implementations are dependent on the operating system; so is the process that they use to get the beginning of the heap. On UNIX, this can be accomplished by calling sbrk(0) at initialization time. On other operating systems the process is different.

Note that you can implement malloc without knowing the location of the heap. You can initialize the free list to NULL, and call sbrk or a similar function with the allocation size each time a free element of the appropriate size is not found.

like image 55
Sergey Kalinichenko Avatar answered Nov 03 '22 01:11

Sergey Kalinichenko


This only about Linux implementations of malloc

Many malloc implementations on Linux or Posix use the mmap(2) syscall to get some quite big range of memory. then they can use munmap(2) to release it.

(It looks like sbrk(2) might not be used a lot any more; in particular, it is not ASLR friendly and might not be multi-thread friendly)

Both these syscalls may be quite expansive, so some implementations ask memory (using mmap) in quite large chunks (e.g. in chunk of one or a few megabytes). Then they manage free space as e.g. linked lists of blocks, etc. They will handle differently small mallocs and large mallocs.

The mmap syscall usually does not start giving memory range at some fixed pieces (notably because of ASLR.

Try on your system to run a simple program printing the result of a single malloc (of e.g. 128 int-s). You probably will observe different addresses from one run to the next (because of ASLR). And strace(1)-ing it is very instructive. Try also cat /proc/self/maps (or print the lines of /proc/self/maps inside your program). See proc(5)

So there is no need to "start" the heap at some address, and on many systems that does not make even any sense. The kernel is giving segments of virtual addresses at random pages.

BTW, both GNU libc and musl libc are free software. You should look inside the source code of their malloc implementation. I find that source code of musl libc is very readable.

like image 20
Basile Starynkevitch Avatar answered Nov 03 '22 01:11

Basile Starynkevitch