I'm trying to understand how printf
works in C for a simple case. I wrote the following program:
#include "stdio.h"
int main(int argc, char const *argv[])
{
printf("Test %s\n", argv[1]);
return 0;
}
Running objdump
on the binary I noticed the Test %s\n
resides in .rodata
objdump -sj .rodata bin
bin: file format elf64-x86-64
Contents of section .rodata:
08e0 01000200 54657374 2025730a 00 ....Test %s..
So formatted print seems to perform additional pattern copying from rodata
to somewhere else.
After compiling and running it with stare ./bin rr
I noticed a brk
syscall before the actual write. So running it with
gdb catch syscall brk
gdb catch syscall write
shows that in my case the current break equals to 0x555555756000
, but it then sets to 0x555555777000
. When the write
occurs the formatted string
x/s $rsi
0x555555756260: "Test rr\n"
Resides between the "old" and "new" break. After the write occurs the programs exits.
QUESTION: Why do we allocate so many pages and why didn't the break returns to the previous one after write syscall occurs? Is there any reason to use brk
instead of mmap
for such formatting?
brk()
(and it's companion sbrk()
) is some kind of mmap()
specialized to manipulate the heap size. It is there for historical reasons, the libc could also use mmap()
or mremap()
directly.
The heap is expanded as additional memory is allocated, for example with malloc()
, which happens internally in the libc, for example to have enough space to create the actual string from the format string and the parameters or many other internal things (i.e. the output buffers when using buffered io with the f* function family).
If some parts of the heap are not used anymore, it is often not automatically deallocated for two main reasons: the heap may be fragmented, and/or the unused heap does not fall below a certain threshold which justifies the operation, because it might be needed again soon.
As a side note: the format string itself is certainly not copied from the ro-section to the heap, this would be completely useless. But the result string is (usually) built on the heap.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With