Whenever you study the memory allocation of processes you usually see it outlined like this:
So far so good.
But then you have the sbrk() system call which allows the program to change the upper limit of its data section, and it can also be used to simply check where that limit is with sbrk(0). Using that function I found the following patterns:
Pattern 1 - Small malloc
I run the following program on my Linux machine:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int globalVar;
int main(){
int localVar;
int *ptr;
printf("localVar address (i.e., stack) = %p\n",&localVar);
printf("globalVar address (i.e., data section) = %p\n",&globalVar);
printf("Limit of data section = %p\n",sbrk(0));
ptr = malloc(sizeof(int)*1000);
printf("ptr address (should be on stack)= %p\n",&ptr);
printf("ptr points to: %p\n",ptr);
printf("Limit of data section after malloc= %p\n",sbrk(0));
return 0;
}
And the output is the following:
localVar address (i.e., stack) = 0xbfe34058
globalVar address (i.e., data section) = 0x804a024
Limit of data section = 0x91d9000
ptr address (should be on stack)= 0xbfe3405c
ptr points to: 0x91d9008
Limit of data section after malloc= 0x91fa000
As you can see the allocated memory region was right above the old data section limit, and after the malloc that limit was pushed upward, so the allocated region is actually inside the new data section.
Question 1: Does this mean that small mallocs will allocate memory in the data section and not use the heap at all?
Pattern 2 - Big Malloc
If you increase the requested memory size on line 15:
ptr = malloc(sizeof(int)*100000);
you will now the following output:
localVar address (i.e., stack) = 0xbf93ba68
globalVar address (i.e., data section) = 0x804a024
Limit of data section = 0x8b16000
ptr address (should be on stack)= 0xbf93ba6c
ptr points to: 0xb750b008
Limit of data section after malloc= 0x8b16000
As you can see here the limit of the data section has not changed, and instead the allocated memory region is in the middle of the gap section, between the data section and the stack.
Question 2: Is this the large malloc actually using the heap?
Question 3: Any explanation for this behavior? I find it a bit insecure, cause on the first example (small malloc) even after you free the allocated memory you'll still be able to use the pointer and use that memory without getting a seg fault, as it will be inside your data section, and this could lead to hard to detect bugs.
Update with Specs: Ubuntu 12.04, 32-bits, gcc version 4.6.3, Linux kernel 3.2.0-54-generic-pae.
Update 2: Rodrigo's answer below solved this mystery. This Wikipedia link also helped.
In all C and C++ code, nearly all of your data is stored in only one of two types of memory storage: All variables allocated by malloc (or new in C++) is stored in heap memory. When malloc is called, the pointer that returns from malloc will always be a pointer to “heap memory”.
The first time, malloc creates a new space (the heap) for the program (by increasing the program break location).
Malloc(12) and malloc(16) allocate 16 bytes for the user, plus an extra 8 bytes for bookkeeping for a total of 24 bytes. Malloc(100) allocates 104 bytes for the user, plus an extra 8 bytes for bookkeeping.
The malloc subsystem manages a logical memory object called a heap. The heap is a region of memory that resides in the application's address space between the last byte of data allocated by the compiler and the end of the data region.
First of all, the only way to be absolutely sure of what happens is to read the source code of malloc
. Or even better, step through it with the debugger.
But anyway, here are my understanding of these things:
sbrk()
is used to increase the size of the data section, all right. Usually, you will not call it directly, but it will be called by the implementation of malloc()
to increase the memory available for the heap.malloc()
does not allocate memory from the OS. It just splits the data section in pieces and assigns these pieces to whoever need them. You use free()
to mark one piece as unused and available for reassignment.malloc()
allocates them using mmap()
with private, non-file backed options. Thus, these blocks are outside of the data segment. Obviously, calling free()
in such a block will call munmap()
.What is exactly a big block depends on many details. See man mallopt
for the gory details.
From that, you can guess what happens when you access to free'd memory:
mmap()
and the same address range happen to be used.Clarification
The term data section is used with two different meanings, depending on the context.
.data
section of the executable (linker point of view). It may also include .bss
or even .rdata
. For the OS that means nothing, it just maps pieces of the program into memory with little regard of what it contains other than the flags (read-only, executable...).sbrk()
.You can see that with the following command that prints the memory layout of a simple program (cat
):
$ cat /proc/self/maps
08048000-08053000 r-xp 00000000 00:0f 1821106 /usr/bin/cat
08053000-08054000 r--p 0000a000 00:0f 1821106 /usr/bin/cat
08054000-08055000 rw-p 0000b000 00:0f 1821106 /usr/bin/cat
09152000-09173000 rw-p 00000000 00:00 0 [heap]
b73df000-b75a5000 r--p 00000000 00:0f 2241249 /usr/lib/locale/locale-archive
b75a5000-b75a6000 rw-p 00000000 00:00 0
b75a6000-b774f000 r-xp 00000000 00:0f 2240939 /usr/lib/libc-2.18.so
b774f000-b7750000 ---p 001a9000 00:0f 2240939 /usr/lib/libc-2.18.so
b7750000-b7752000 r--p 001a9000 00:0f 2240939 /usr/lib/libc-2.18.so
b7752000-b7753000 rw-p 001ab000 00:0f 2240939 /usr/lib/libc-2.18.so
b7753000-b7756000 rw-p 00000000 00:00 0
b7781000-b7782000 rw-p 00000000 00:00 0
b7782000-b7783000 r-xp 00000000 00:00 0 [vdso]
b7783000-b77a3000 r-xp 00000000 00:0f 2240927 /usr/lib/ld-2.18.so
b77a3000-b77a4000 r--p 0001f000 00:0f 2240927 /usr/lib/ld-2.18.so
b77a4000-b77a5000 rw-p 00020000 00:0f 2240927 /usr/lib/ld-2.18.so
bfba0000-bfbc1000 rw-p 00000000 00:00 0 [stack]
The first line is the executable code (.text
section).
The second line is the read-only data (.rdata
section) and some other read-only sections.
The third line is the .data
+ .bss
and some other writable sections.
The fourth line is the heap!
The next lines, those with a name are memory mapped files or shared objects. Those without a name are probably big malloc'ed blocks of memory (or maybe private anonymous mmap's, they are impossible to distinguish).
The last line is the stack!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With