Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Code description of ptmalloc implementation

I'm looking forward to understanding how dynamic memory management works at low level in GNU/Linux systems (aka, how ptmalloc works).

Of course, I've read the code but I have a lot of doubts. I, more or less, understand the data structures but I have many information leaks!

My question is if someone knows about any resource explaining in detail the implementation. For example, I've read papers such as 'Understanding the heap by breaking it' or the 'Malloc Malleficarum' series and post-series. They do a great job, but, of course, they are more focused in exploitation than in explaining many implementation details.

If you don't know about any resource, here are some of my questions.

  • What really is an arena? In the code for the variable ar_ptr from heap_info struct there is a comment saying 'arena for this heap', so an arena can not be a heap (as it is said everywhere).

  • Why in the heap_info struct there is not a next pointer and there is a prev pointer? Is it because of main_arena? And what is main_arena?

  • Every heap_info struct can have more than one arena (pointing to different malloc_state structures)?

  • When are created news arenas and what code handles it? I've read that new arenas are created when an arena requested for storing data is locked (because the process or a process thread is working with it) and I've also read that each process thread have a different arena. The important thing here is if you know what code handle these situations.

  • I also don't understand when people says that all memory operations born from the top chunk or wilderest chunk. Do you know where can I found this code?

BTW, I don't want to go deep with mutex details.

I'm reviewing ptmalloc implementation in glibc 2.12.1. I would like to make some diagrams about the overall structure of everything so I need to understand these things!

Thank you.

like image 889
newlog Avatar asked Jul 11 '12 17:07

newlog


People also ask

How malloc is implemented in C?

When one calls malloc , memory is taken from the large heap cell, which is returned by malloc . The rest is formed into a new heap cell that consists of all the rest of the memory. When one frees memory, the heap cell is added to the end of the heap's free list.

What exactly is malloc?

In C, the library function malloc is used to allocate a block of memory on the heap. The program accesses this block of memory via a pointer that malloc returns. When the memory is no longer needed, the pointer is passed to free which deallocates the memory so that it can be used for other purposes.

How does Calloc work in C?

The calloc() function in C is used to allocate a specified amount of memory and then initialize it to zero. The function returns a void pointer to this memory location, which can then be cast to the desired type. The function takes in two parameters that collectively specify the amount of memory ​​to be allocated.


2 Answers

Ok, I've done some research and I have the answer for many of those questions.

  • The arena is the memory region in which all the dynamic data of a process will be stored. In short, the arena is the memory structure that in the past was called heap. Given that nowadays (with the multithreading stuff) you want to have more than one heap per process, you handle it by creating something called arena, but this arena is nothing more than a heap. The heap_info structure only manages the multiple existent arenas of a process.

  • I don't know why there's only a prev pointer. What I know is that normally, all the dynamic data is stored in the main_arena, it is the arena that is created for that process. I don't know in which circumstances the main_arena is not used, what I know is that if the size field in a memory chunk has the NON_MAIN_ARENA bit set, then the main_arena is not used, and the algorithm gets the new arena address from clearing out the 20 less significant bits of that memory chunk pointer through the heap_for_ptr() macro. To sum up, in normal circumstances, the main_arena will always be used.

  • Yes, as I said, each heap_info struct can have multiple arenas. This is because of lock contention. If you have enough free time, you can read about this in [1].

  • I don't remember this one. But the fact is that if the arena is in use, locked, a new arena is created. Search any call to a function similar to new_arena() or new_heap(). I remember that the function name was similar to that.

  • I think that this only mean that at the beginning all memory space for the heap is the top chunk (or wilderness chunk), so when new memory requests are demanded from the process, this top chunk is divided and fragmented. So it all starts with the top chunk.

Without wanting to be pompous, I think that, after three months, my answer is the one that is more adjusted to my questions, so I'll put it as the correct one. On the other hand, thanks for all the other answers. They have been really helpful.

BTW, I've put all this research in a paper, but given that it is in spanish, I don't think it will be of use here, and I don't know if it would be considered spam. [2]

[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.4439

[2] Here you have the paper: http://overflowedminds.net/papers/newlog/linux_heap_exploiting_revisited.pdf

like image 70
newlog Avatar answered Nov 15 '22 15:11

newlog


A heap is basically divided into many small regions which independently cater one or more allocated objects. One such region might be called arena or zone. Primarily a heap is a collection of arenas catering objects with the requirement that one arena can be deallocated in a single operation. To make this possible, one entire arena is allocated as a single contiguous range of memory addresses.

The difference between arena or zone is some what grey. I am not sure about Linux, but one example is a real world multicore network processor family by Cavium networks called Octeon. Which treats allocated memory as arena or zone with the difference being that zone can allocate objects of fixed equal size where as arena can have objects of different sizes. This would naturally result in fragmentation in case of arena. But i can't confirm if that is the case with Linux too.

Region based memory management

like image 33
fkl Avatar answered Nov 15 '22 14:11

fkl