Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Query on -ffunction-section & -fdata-sections options of gcc

The below mentioned in the GCC Page for the function sections and data sections options:

-ffunction-sections -fdata-sections 

Place each function or data item into its own section in the output file if the target supports arbitrary sections. The name of the function or the name of the data item determines the section's name in the output file. Use these options on systems where the linker can perform optimizations to improve locality of reference in the instruction space. Most systems using the ELF object format and SPARC processors running Solaris 2 have linkers with such optimizations. AIX may have these optimizations in the future.

Only use these options when there are significant benefits from doing so. When you specify these options, the assembler and linker will create larger object and executable files and will also be slower. You will not be able to use gprof on all systems if you specify this option and you may have problems with debugging if you specify both this option and -g.

I was under the impression that these options will help in reducing the executable file size. Why does this page say that it will create larger executable files? Am I missing something?

like image 565
Jay Avatar asked Nov 25 '10 07:11

Jay


People also ask

Is it query on or query about?

Both prepositions are acceptable in this context. I believe "query about" is more frequent, but I'd use both here. I'd use 'query concerning' if it was a general query ('is one allowed to wear a wig when taking the exam? ') but 'query about/on' if more concretely related ('will it be written in Greek?

What you mean by query?

1 : to ask questions of especially with a desire for authoritative information. 2 : to ask questions about especially in order to resolve a doubt. 3 : to put as a question.

Does query mean to ask?

To query means to ask a question.


2 Answers

Interestingly, using -fdata-sections can make the literal pools of your functions, and thus your functions themselves larger. I've noticed this on ARM in particular, but it's likely to be true elsewhere. The binary I was testing only grew by a quarter of a percent, but it did grow. Looking at the disassembly of the changed functions it was clear why.

If all of the BSS (or DATA) entries in your object file are allocated to a single section then the compiler can store the address of that section in the functions literal pool and generate loads with known offsets from that address in the function to access your data. But if you enable -fdata-sections it puts each piece of BSS (or DATA) data into its own section, and since it doesn't know which of these sections might be garbage collected later, or what order the linker will place all of these sections into the final executable image, it can no longer load data using offsets from a single address. So instead, it has to allocate an entry in the literal pool per used data, and once the linker has figured out what is going into the final image and where, then it can go and fix up these literal pool entries with the actual address of the data.

So yes, even with -Wl,--gc-sections the resulting image can be larger because the actual function text is larger.

Below I've added a minimal example

The code below is enough to see the behavior I'm talking about. Please don't be thrown off by the volatile declaration and use of global variables, both of which are questionable in real code. Here they ensure the creation of two data sections when -fdata-sections is used.

static volatile int head; static volatile int tail;  int queue_empty(void) {     return head == tail; } 

The version of GCC used for this test is:

gcc version 6.1.1 20160526 (Arch Repository) 

First, without -fdata-sections we get the following.

> arm-none-eabi-gcc -march=armv6-m \                     -mcpu=cortex-m0 \                     -mthumb \                     -Os \                     -c \                     -o test.o \                     test.c  > arm-none-eabi-objdump -dr test.o  00000000 <queue_empty>:  0: 4b03     ldr   r3, [pc, #12]   ; (10 <queue_empty+0x10>)  2: 6818     ldr   r0, [r3, #0]  4: 685b     ldr   r3, [r3, #4]  6: 1ac0     subs  r0, r0, r3  8: 4243     negs  r3, r0  a: 4158     adcs  r0, r3  c: 4770     bx    lr  e: 46c0     nop                   ; (mov r8, r8) 10: 00000000 .word 0x00000000              10: R_ARM_ABS32 .bss  > arm-none-eabi-nm -S test.o  00000000 00000004 b head 00000000 00000014 T queue_empty 00000004 00000004 b tail 

From arm-none-eabi-nm we see that queue_empty is 20 bytes long (14 hex), and the arm-none-eabi-objdump output shows that there is a single relocation word at the end of the function, it's the address of the BSS section (the section for uninitialized data). The first instruction in the function loads that value (the address of the BSS) into r3. The next two instructions load relative to r3, offsetting by 0 and 4 bytes respectively. These two loads are the loads of the values of head and tail. We can see those offsets in the first column of the output from arm-none-eabi-nm. The nop at the end of the function is to word align the address of the literal pool.

Next we'll see what happens when -fdata-sections is added.

arm-none-eabi-gcc -march=armv6-m \                   -mcpu=cortex-m0 \                   -mthumb \                   -Os \                   -fdata-sections \                   -c \                   -o test.o \                   test.c  arm-none-eabi-objdump -dr test.o  00000000 <queue_empty>:  0: 4b03     ldr   r3, [pc, #12]    ; (10 <queue_empty+0x10>)  2: 6818     ldr   r0, [r3, #0]  4: 4b03     ldr   r3, [pc, #12]    ; (14 <queue_empty+0x14>)  6: 681b     ldr   r3, [r3, #0]  8: 1ac0     subs  r0, r0, r3  a: 4243     negs  r3, r0  c: 4158     adcs  r0, r3  e: 4770     bx    lr     ...              10: R_ARM_ABS32 .bss.head              14: R_ARM_ABS32 .bss.tail  arm-none-eabi-nm -S test.o  00000000 00000004 b head 00000000 00000018 T queue_empty 00000000 00000004 b tail 

Immediately we see that the length of queue_empty has increased by four bytes to 24 bytes (18 hex), and that there are now two relocations to be done in queue_empty's literal pool. These relocations correspond to the addresses of the two BSS sections that were created, one for each global variable. There need to be two addresses here because the compiler can't know the relative position that the linker will end up putting the two sections in. Looking at the instructions at the beginning of queue_empty, we see that there is an extra load, the compiler has to generate separate load pairs to get the address of the section and then the value of the variable in that section. The extra instruction in this version of queue_empty doesn't make the body of the function longer, it just takes the spot that was previously a nop, but that won't be the case in general.

like image 77
Anton Staaf Avatar answered Sep 23 '22 09:09

Anton Staaf


When using those compiler options, you can add the linker option -Wl,--gc-sections that will remove all unused code.

like image 43
leppie Avatar answered Sep 24 '22 09:09

leppie