I was writing some assembly code for some project of mine and I saw something interesting. the size of binary when linked is so big. so I tested and tested and even with smallest possible lines of code, output Elf binary is so large. for example:
.section .text
.global _start
_start:
movl $1,%eax
movl $0,%ebx
int $0x80
after assembling and linking above code the result binary is more than 4kb!
the funny thing is, most of the binary is filled with zeroes.
I tried so many things to find out what is the cause to no success.
can someone please explain to me what is the problem here?
I simply assemble and link the file:
as -o <OBJ_NAME> <SOURCE NAME>
ld -o <ELF_NAME> <OBJ_NAME>
recommending any form of resource for further reading will be nice.
as you may guessed, I use 64bit GNU/Linux
thanks.
This has to do with alignment. See readelf -eW <ELF_NAME>
. The interesting bit is
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 0000000000401000 001000 00000c 00 AX 0 0 1
Note the Off
column. This is the offset in the file, and the .text
section starts with 0x1000
, which is 4K.
Same picture if you look at the program headers. The space that is filled with zeroes is between the end of the ELF header and 0x1000.
Why is this?
First, because the ELF standard dictates that
Loadable process segments must have congruent values for p_vaddr and p_offset, modulo the page size.
(see man elf
). The page size on your system (mine as well) is 4K. This is the value that you see in p_align
.
Second, the virtual address the linker has assigned to the start of the "text" segment — same as for the .text
section here, because that's all that segment contains here — is 0x0000000000401000
. Therefore the hexadecimal representation of the "text" segment's offset in the file has to end with 000
. But 0 is already taken by the readonly segment containing the ELF header (the very beginning of the file). The second choice is 0x1000
.
Why did the linker choose 0x401000 as the virtual address for the text section? I don't know. I think, if you tweak the linker script a little, you'll be able to have a smaller resluting executable.
As Peter and that other guy have pointed out, page-size alignment can be disabled using the -n
linker option:
'-n'
'--nmagic'
Turn off page alignment of sections, and disable linking against
shared libraries[…]
That way I get
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 1] .text PROGBITS 0000000000400078 000078 00000c 00 AX 0 0 1
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000078 0x0000000000400078 0x0000000000400078 0x00000c 0x00000c R E 0x1
and the size of the executable is down to 664 bytes (344 after strip
ping).
With GNU ld, you can use linker scripts to fine-control the layout of linker output files. ld.bfd
(usually also known as just ld
) interprets a default linker script if the user doesn't specify one. It can be obtained with ld --verbose
. You can then edit it and supply your version instead of the default with -T <your-script>
.
I edited out the first occurance of
. = ALIGN(CONSTANT (MAXPAGESIZE));
(before .text
) and got 720 (400 when strip
ped) bytes. This is different from the result of using the -n
option. You still get 2 loadable segmemts, and their p_align
is still 0x1000
.
There are efficiency implications for having p_align
< MAX_PAGE_SIZE
that I don't fully understand. (Pages won't be loaded as fast due to harder address computation? I think there should be a better explanation.) Feel free to edit the answer, if you know more about this or where it's explained.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With