Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding printf in C

Tags:

c

linux

printf

I'm trying to understand how printf works in C for a simple case. I wrote the following program:

#include "stdio.h"

int main(int argc, char const *argv[])
{
    printf("Test %s\n", argv[1]);
    return 0;
}

Running objdump on the binary I noticed the Test %s\n resides in .rodata

objdump -sj .rodata bin

bin:     file format elf64-x86-64

Contents of section .rodata:
 08e0 01000200 54657374 2025730a 00        ....Test %s..

So formatted print seems to perform additional pattern copying from rodata to somewhere else.

After compiling and running it with stare ./bin rr I noticed a brk syscall before the actual write. So running it with

gdb catch syscall brk
gdb catch syscall write

shows that in my case the current break equals to 0x555555756000, but it then sets to 0x555555777000. When the write occurs the formatted string

x/s $rsi
0x555555756260: "Test rr\n"

Resides between the "old" and "new" break. After the write occurs the programs exits.

QUESTION: Why do we allocate so many pages and why didn't the break returns to the previous one after write syscall occurs? Is there any reason to use brk instead of mmap for such formatting?

like image 756
St.Antario Avatar asked Jan 08 '19 11:01

St.Antario


1 Answers

brk() (and it's companion sbrk()) is some kind of mmap() specialized to manipulate the heap size. It is there for historical reasons, the libc could also use mmap() or mremap() directly.

The heap is expanded as additional memory is allocated, for example with malloc(), which happens internally in the libc, for example to have enough space to create the actual string from the format string and the parameters or many other internal things (i.e. the output buffers when using buffered io with the f* function family).

If some parts of the heap are not used anymore, it is often not automatically deallocated for two main reasons: the heap may be fragmented, and/or the unused heap does not fall below a certain threshold which justifies the operation, because it might be needed again soon.

As a side note: the format string itself is certainly not copied from the ro-section to the heap, this would be completely useless. But the result string is (usually) built on the heap.

like image 121
Ctx Avatar answered Nov 14 '22 23:11

Ctx