Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Malloc only use the heap if requested memory space is large?

Whenever you study the memory allocation of processes you usually see it outlined like this:

enter image description here

So far so good.

But then you have the sbrk() system call which allows the program to change the upper limit of its data section, and it can also be used to simply check where that limit is with sbrk(0). Using that function I found the following patterns:

Pattern 1 - Small malloc

I run the following program on my Linux machine:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int globalVar;

int main(){
        int localVar;
        int *ptr;

        printf("localVar address (i.e., stack) = %p\n",&localVar);
        printf("globalVar address (i.e., data section) = %p\n",&globalVar);
        printf("Limit of data section = %p\n",sbrk(0));

        ptr = malloc(sizeof(int)*1000);

        printf("ptr address (should be on stack)= %p\n",&ptr);
        printf("ptr points to: %p\n",ptr);
        printf("Limit of data section after malloc= %p\n",sbrk(0));

        return 0;
}

And the output is the following:

localVar address (i.e., stack) = 0xbfe34058
globalVar address (i.e., data section) = 0x804a024
Limit of data section = 0x91d9000
ptr address (should be on stack)= 0xbfe3405c
ptr points to: 0x91d9008
Limit of data section after malloc= 0x91fa000

As you can see the allocated memory region was right above the old data section limit, and after the malloc that limit was pushed upward, so the allocated region is actually inside the new data section.

Question 1: Does this mean that small mallocs will allocate memory in the data section and not use the heap at all?

Pattern 2 - Big Malloc

If you increase the requested memory size on line 15:

ptr = malloc(sizeof(int)*100000);

you will now the following output:

localVar address (i.e., stack) = 0xbf93ba68
globalVar address (i.e., data section) = 0x804a024
Limit of data section = 0x8b16000
ptr address (should be on stack)= 0xbf93ba6c
ptr points to: 0xb750b008
Limit of data section after malloc= 0x8b16000

As you can see here the limit of the data section has not changed, and instead the allocated memory region is in the middle of the gap section, between the data section and the stack.

Question 2: Is this the large malloc actually using the heap?

Question 3: Any explanation for this behavior? I find it a bit insecure, cause on the first example (small malloc) even after you free the allocated memory you'll still be able to use the pointer and use that memory without getting a seg fault, as it will be inside your data section, and this could lead to hard to detect bugs.

Update with Specs: Ubuntu 12.04, 32-bits, gcc version 4.6.3, Linux kernel 3.2.0-54-generic-pae.

Update 2: Rodrigo's answer below solved this mystery. This Wikipedia link also helped.

like image 288
Daniel Scocco Avatar asked Oct 10 '13 19:10

Daniel Scocco


People also ask

Does malloc always allocate on the heap?

In all C and C++ code, nearly all of your data is stored in only one of two types of memory storage: All variables allocated by malloc (or new in C++) is stored in heap memory. When malloc is called, the pointer that returns from malloc will always be a pointer to “heap memory”.

Does malloc increase heap size?

The first time, malloc creates a new space (the heap) for the program (by increasing the program break location).

How much memory does malloc use?

Malloc(12) and malloc(16) allocate 16 bytes for the user, plus an extra 8 bytes for bookkeeping for a total of 24 bytes. Malloc(100) allocates 104 bytes for the user, plus an extra 8 bytes for bookkeeping.

How is memory allocated in malloc?

The malloc subsystem manages a logical memory object called a heap. The heap is a region of memory that resides in the application's address space between the last byte of data allocated by the compiler and the end of the data region.


1 Answers

First of all, the only way to be absolutely sure of what happens is to read the source code of malloc. Or even better, step through it with the debugger.

But anyway, here are my understanding of these things:

  1. The system call sbrk() is used to increase the size of the data section, all right. Usually, you will not call it directly, but it will be called by the implementation of malloc() to increase the memory available for the heap.
  2. The function malloc() does not allocate memory from the OS. It just splits the data section in pieces and assigns these pieces to whoever need them. You use free() to mark one piece as unused and available for reassignment.
  3. Point 2 is an oversimplification. At least the GCC implementation, for big blocks, malloc() allocates them using mmap() with private, non-file backed options. Thus, these blocks are outside of the data segment. Obviously, calling free() in such a block will call munmap().

What is exactly a big block depends on many details. See man mallopt for the gory details.

From that, you can guess what happens when you access to free'd memory:

  1. If the block was small, the memory will still be there, so if you read nothing will happen. If you write to it, you may corrupt the internal heap structures, or it may have been reused and you can corrupt any random structure.
  2. If the block was big, the memory has been unmapped, so any access will result in a segmentation fault. Unless the improbable situation that in the interim, another big block is allocated (or another thread calls mmap() and the same address range happen to be used.

Clarification

The term data section is used with two different meanings, depending on the context.

  1. The .data section of the executable (linker point of view). It may also include .bss or even .rdata. For the OS that means nothing, it just maps pieces of the program into memory with little regard of what it contains other than the flags (read-only, executable...).
  2. The heap, that block of memory that every process has, that is not read from the executable, and that can be grown using sbrk().

You can see that with the following command that prints the memory layout of a simple program (cat):

$ cat /proc/self/maps
08048000-08053000 r-xp 00000000 00:0f 1821106    /usr/bin/cat
08053000-08054000 r--p 0000a000 00:0f 1821106    /usr/bin/cat
08054000-08055000 rw-p 0000b000 00:0f 1821106    /usr/bin/cat
09152000-09173000 rw-p 00000000 00:00 0          [heap]
b73df000-b75a5000 r--p 00000000 00:0f 2241249    /usr/lib/locale/locale-archive
b75a5000-b75a6000 rw-p 00000000 00:00 0 
b75a6000-b774f000 r-xp 00000000 00:0f 2240939    /usr/lib/libc-2.18.so
b774f000-b7750000 ---p 001a9000 00:0f 2240939    /usr/lib/libc-2.18.so
b7750000-b7752000 r--p 001a9000 00:0f 2240939    /usr/lib/libc-2.18.so
b7752000-b7753000 rw-p 001ab000 00:0f 2240939    /usr/lib/libc-2.18.so
b7753000-b7756000 rw-p 00000000 00:00 0 
b7781000-b7782000 rw-p 00000000 00:00 0 
b7782000-b7783000 r-xp 00000000 00:00 0          [vdso]
b7783000-b77a3000 r-xp 00000000 00:0f 2240927    /usr/lib/ld-2.18.so
b77a3000-b77a4000 r--p 0001f000 00:0f 2240927    /usr/lib/ld-2.18.so
b77a4000-b77a5000 rw-p 00020000 00:0f 2240927    /usr/lib/ld-2.18.so
bfba0000-bfbc1000 rw-p 00000000 00:00 0          [stack]

The first line is the executable code (.text section).

The second line is the read-only data (.rdata section) and some other read-only sections.

The third line is the .data + .bss and some other writable sections.

The fourth line is the heap!

The next lines, those with a name are memory mapped files or shared objects. Those without a name are probably big malloc'ed blocks of memory (or maybe private anonymous mmap's, they are impossible to distinguish).

The last line is the stack!

like image 130
rodrigo Avatar answered Oct 07 '22 12:10

rodrigo