I'm performing some tests with gcc
to understand the rule(s) by which it intelligently excludes unused symbols.
// main.c
#include <stdio.h>
void foo()
{
}
int main( int argc, char* argv[] )
{
return 0;
}
.
// bar.c
int bar()
{
return 42;
}
.
> gcc --version
gcc (GCC) 8.2.1 20181215 (Red Hat 8.2.1-6)
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> gcc -c bar.c
> gcc -g main.c bar.o
> nm a.out | grep "foo\|bar"
000000000040111f T bar
0000000000401106 T foo
Above, I've compiled bar.o
, and linked it with a.out
while compiling main.c
.
Listing a.out
's symbols show that both unused functions - foo()
and bar()
- are included in the executable.
> ar -r libbar.a bar.o
ar: creating libbar.a
> gcc -g main.c -L ./ -lbar
> nm a.out | grep "foo\|bar"
0000000000401106 T foo
Above, I've archived bar.o
to libbar.a
, and recreated a.out
, this time linking with libbar.a
instead of bar.o
. This time around, unused function foo()
is still present, but bar()
is not.
From this experiment, I might surmise the following "rules":
foo()
is always present: is there a temporary/anonymous main.o
that's created? If so, it would include foo()
)gcc
will intelligently figure out unnecessary symbols to exclude.The above are my hypotheses based on this experiment - but how correct is it? If someone is knowledgeable with the intricacies of how linking works, I'd be grateful for some background information explaining the whys and wherefores of what's going on.
So the linker is able to remove each individual function because it is in its own section. So enabling this for your library will allow the linker to remove unused functions from the library.
No: for unused globally available functions. The compiler doesn't know if some other compilation unit references it. Also, most object module types do not allow functions to be removed after compilation and also do not provide a way for the linker to tell if there exist internal references.
--gc-sections decides which input sections are used by examining symbols and relocations. The section containing the entry symbol and all sections containing symbols undefined on the command-line will be kept, as will sections containing symbols referenced by dynamic objects.
It's mostly correct with the caveat that static-library linking doesn't really have per-symbol granularity. It has per-member-object-file granularity.
Example:
If the static library contains files:
a.o
foo
bar
b.o
baz
and an undefined reference to foo
needs to be resolved, a.o
will be brought in, and with it the bar
symbol as well.
You can get the effect of per symbol granularity when you compile with -ffunction-sections
-fdata-sections
and then link with -Wl,--gc-sections
(gc stands for garbage-collect), but bear in mind that the compiler/linker options are gcc/clang-specific and that they have some minor performance/code-size cost.
-ffunction-sections
puts each function in its own section (sort of like its own object file) and -fdata-sections
does the same thing for externally visible global variables. -Wl,--gc-sections
then causes a garbage collector to run after the object files are linked as usual, and the garbage collector removes all sections (=>symbols) that are unreachable.
(-ffunction-sections
is also useful if you want size -A the_objectfile.o
to give you function sizes and if you also want those functions sizes to
not slightly fluctuate based on the position of the functions (due to alignment requirements).)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With