Query on -ffunction-section & -fdata-sections options of gcc

Tags:

The below mentioned in the GCC Page for the function sections and data sections options:

-ffunction-sections -fdata-sections 
Place each function or data item into its own section in the output file if the target supports arbitrary sections. The name of the function or the name of the data item determines the section's name in the output file. Use these options on systems where the linker can perform optimizations to improve locality of reference in the instruction space. Most systems using the ELF object format and SPARC processors running Solaris 2 have linkers with such optimizations. AIX may have these optimizations in the future.

Only use these options when there are significant benefits from doing so. When you specify these options, the assembler and linker will create larger object and executable files and will also be slower. You will not be able to use gprof on all systems if you specify this option and you may have problems with debugging if you specify both this option and -g.

I was under the impression that these options will help in reducing the executable file size. Why does this page say that it will create larger executable files? Am I missing something?

565

asked Nov 25 '10 07:11

Jay

2 Answers

Interestingly, using -fdata-sections can make the literal pools of your functions, and thus your functions themselves larger. I've noticed this on ARM in particular, but it's likely to be true elsewhere. The binary I was testing only grew by a quarter of a percent, but it did grow. Looking at the disassembly of the changed functions it was clear why.

If all of the BSS (or DATA) entries in your object file are allocated to a single section then the compiler can store the address of that section in the functions literal pool and generate loads with known offsets from that address in the function to access your data. But if you enable -fdata-sections it puts each piece of BSS (or DATA) data into its own section, and since it doesn't know which of these sections might be garbage collected later, or what order the linker will place all of these sections into the final executable image, it can no longer load data using offsets from a single address. So instead, it has to allocate an entry in the literal pool per used data, and once the linker has figured out what is going into the final image and where, then it can go and fix up these literal pool entries with the actual address of the data.

So yes, even with -Wl,--gc-sections the resulting image can be larger because the actual function text is larger.

Below I've added a minimal example

The code below is enough to see the behavior I'm talking about. Please don't be thrown off by the volatile declaration and use of global variables, both of which are questionable in real code. Here they ensure the creation of two data sections when -fdata-sections is used.

static volatile int head; static volatile int tail;  int queue_empty(void) {     return head == tail; }

The version of GCC used for this test is:

gcc version 6.1.1 20160526 (Arch Repository)

First, without -fdata-sections we get the following.

> arm-none-eabi-gcc -march=armv6-m \                     -mcpu=cortex-m0 \                     -mthumb \                     -Os \                     -c \                     -o test.o \                     test.c  > arm-none-eabi-objdump -dr test.o  00000000 <queue_empty>:  0: 4b03     ldr   r3, [pc, #12]   ; (10 <queue_empty+0x10>)  2: 6818     ldr   r0, [r3, #0]  4: 685b     ldr   r3, [r3, #4]  6: 1ac0     subs  r0, r0, r3  8: 4243     negs  r3, r0  a: 4158     adcs  r0, r3  c: 4770     bx    lr  e: 46c0     nop                   ; (mov r8, r8) 10: 00000000 .word 0x00000000              10: R_ARM_ABS32 .bss  > arm-none-eabi-nm -S test.o  00000000 00000004 b head 00000000 00000014 T queue_empty 00000004 00000004 b tail

From arm-none-eabi-nm we see that queue_empty is 20 bytes long (14 hex), and the arm-none-eabi-objdump output shows that there is a single relocation word at the end of the function, it's the address of the BSS section (the section for uninitialized data). The first instruction in the function loads that value (the address of the BSS) into r3. The next two instructions load relative to r3, offsetting by 0 and 4 bytes respectively. These two loads are the loads of the values of head and tail. We can see those offsets in the first column of the output from arm-none-eabi-nm. The nop at the end of the function is to word align the address of the literal pool.

Next we'll see what happens when -fdata-sections is added.

arm-none-eabi-gcc -march=armv6-m \                   -mcpu=cortex-m0 \                   -mthumb \                   -Os \                   -fdata-sections \                   -c \                   -o test.o \                   test.c  arm-none-eabi-objdump -dr test.o  00000000 <queue_empty>:  0: 4b03     ldr   r3, [pc, #12]    ; (10 <queue_empty+0x10>)  2: 6818     ldr   r0, [r3, #0]  4: 4b03     ldr   r3, [pc, #12]    ; (14 <queue_empty+0x14>)  6: 681b     ldr   r3, [r3, #0]  8: 1ac0     subs  r0, r0, r3  a: 4243     negs  r3, r0  c: 4158     adcs  r0, r3  e: 4770     bx    lr     ...              10: R_ARM_ABS32 .bss.head              14: R_ARM_ABS32 .bss.tail  arm-none-eabi-nm -S test.o  00000000 00000004 b head 00000000 00000018 T queue_empty 00000000 00000004 b tail

Immediately we see that the length of queue_empty has increased by four bytes to 24 bytes (18 hex), and that there are now two relocations to be done in queue_empty's literal pool. These relocations correspond to the addresses of the two BSS sections that were created, one for each global variable. There need to be two addresses here because the compiler can't know the relative position that the linker will end up putting the two sections in. Looking at the instructions at the beginning of queue_empty, we see that there is an extra load, the compiler has to generate separate load pairs to get the address of the section and then the value of the variable in that section. The extra instruction in this version of queue_empty doesn't make the body of the function longer, it just takes the spot that was previously a nop, but that won't be the case in general.

answered Sep 23 '22 09:09

Anton Staaf

When using those compiler options, you can add the linker option -Wl,--gc-sections that will remove all unused code.

answered Sep 24 '22 09:09

leppie

Related questions
                            
                                looping through enum values
                            
                                How to extract filename from path
                            
                                Clean ways to do multiple undos in C
                            
                                use RPATH but not RUNPATH?
                            
                                strange output in comparison of float with float literal
                            
                                What is the significance of 0.0f when initializing (in C)?
                            
                                How to compile and run C files from within Notepad++ using NppExec plugin?
                            
                                Location of C standard library
                            
                                What is lock-free multithreaded programming?
                            
                                Branch-aware programming
                            
                                strptime() equivalent on Windows?
                            
                                How do I gracefully exit an X11 event loop?
                            
                                How can I compile to assembly with gcc
                            
                                Array of pointers to an array of fixed size
                            
                                Using a variable with the same name in different spaces
                            
                                C: How to wrap a float to the interval [-pi, pi)
                            
                                Debugging child process after fork (follow-fork-mode child configured)
                            
                                sizeof() operator in if-statement
                            
                                Compile program for 32bit on 64bit Linux OS causes fatal error
                            
                                How to create a typedef for function pointers

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Query on -ffunction-section & -fdata-sections options of gcc

Tags:

c

optimization

gcc

linker

size

Jay

People also ask

2 Answers

Anton Staaf

leppie

Recent Activity

Donate For Us