The best explanation I was able to find was from the official document:
-r --relocateable Generate relocatable output--i.e., generate an output file that can in turn serve as input to ld. This is often called partial linking. As a side effect, in environments that support standard Unix magic numbers, this option also sets the output file's magic number to OMAGIC. If this option is not specified, an absolute file is produced. When linking C++ programs, this option will not resolve references to constructors; to do that, use -Ur. This option does the same thing as `-i'.
I am specifically interested in knowing what happens to the symbols present in inputs to linker. Take a specific case when I have a static library libstatic.a which contains a single object file component.o. Now, I want to create another static library libfinal.a which will work as an interface to libstatic.a. I use this command to create it:
ld -r -o libfinal.a wrapper.o -L. -lstatic
Where wrapper.o provides exclusive APIs to call the functions defined in libstatic.a
Will the libfinal.a be just a combined archive having wrapper.o and component.o or all the references which can-be-resolved between wrapper.o and component.o be resolved(linking) and then placed into libfinal.a?
Edit_1: Updating the question based on the progress made:
The objdump of the component library libstatic.a
(objdump -D libstatic.a
) shows .text
sections separately for each function (as expected). Whereas in the combined library libfinal.a
, which has been created by partial linking (-r
flag) there is just one single .text
section. I guess this means that an internal-linking has taken place and it's not just creating a plain archive.
Minimal runnable example
Here I produce a minimal example and compile it in two ways to produce functionally identical executables:
f12.c
file without partial linking linking into f12.o
f1.c
and f2.c
which are first partially linked into f12_r.o
main.c
#include <assert.h>
#include <stdlib.h>
int f_1_2(void);
int f_2_1(void);
int main(void) {
assert(f_1_2() + f_2_1() == 5);
return EXIT_SUCCESS;
}
f1.c
#include "f1.h"
f2.c
#include "f2.h"
f12.c
#include "f1.h"
#include "f2.h"
f1.h
int f_2(void);
int f_1_2(void) {
return f_2() + 1;
}
int f_1(void) {
return 1;
}
f2.h
int f_1(void);
int f_2_1(void) {
return f_1() + 1;
}
int f_2(void) {
return 2;
}
run.sh
#!/usr/bin/env bash
set -eux
cflags='-ggdb3 -std=c99 -O0 -fPIE -pie'
gcc $cflags -c -o f1.o f1.c
gcc $cflags -c -o f2.o f2.c
gcc $cflags -c -o f12.o f12.c
ld -o f12_r.o -r f1.o f2.o
gcc $cflags -c -o main.o main.c
gcc $cflags -o main.out f12.o main.o
gcc $cflags -o main_r.out f12_r.o main.o
./main.out
./main_r.out
GitHub upstream.
If we try the same thing but without ld -r
, then we get the final warnings:
+ ld -o f12_r.o f1.o f2.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000000401000
+ gcc -ggdb3 -std=c99 -O0 -fPIE -pie -o main_r.out f12_r.o main.o
/usr/bin/ld: error in f12_r.o(.eh_frame); no .eh_frame_hdr table will be created
none of them makes makes the tool exit non-0, and the final executable still runs, so I'm not sure how bad it is. TODO understand.
Binary analysis
If you are not familiar with relocation, first read this: What do linkers do?
The key question is how could partial linking speed up the link. The only thing I could think of was by resolving references across pre-linked files. I've focused on this for now.
However, it does not do that as asked at: Resolve relative relocations in partial link so I would expect it not to speed up link significantly.
I have confirmed this with:
objdump -S f12.o
objdump -S f12_r.o
both of which produce identical outputs that contain:
int f_1_2(void) {
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
return f_2() + 1;
4: e8 00 00 00 00 callq 9 <f_1_2+0x9>
9: 83 c0 01 add $0x1,%eax
}
c: 5d pop %rbp
d: c3 retq
so we see that the call to f_1_2
has not yet been resolved in either case because the relative offset address is still 0: e8 00 00 00 00
(e8
is the opcode).
This also taught me that GCC does not resolve function calls before the final link either TODO rationale, possible to force it to resolve?
Benchmark
I had benchmarked LD vs GOLD at: Replacing ld with gold - any experience? so I decided to reuse it to see if partial linking leads to any link speedup.
I generated the test objects with this script:
./generate-objects 100 1000 100
and then I started with the most extreme link case possible: pre-link everything except the main file, and then benchmark the final link:
mv main.o ..
ld -o partial.o -r *.o
time gcc partial.o ../main.o
time gcc -fuse-ld=gold partial.o ../main.o
The wall clock time results in seconds were as follows:
No partial link Partial link
No Gold 6.15 5.756
Gold 4.06 4.457
Therefore:
Therefore, based on this experiment, it seems that partial linking may not speed up your link time, at all, and I'd just recommend you to try GOLD instead to start with.
Let me know if you can produce a concrete example where incremental linking leads to significant speedup.
Case study: the Linux kernel
The Linux kernel is one example of a large project that used to use incremental linking, so maybe we can learn something from it.
It has since moved to ar T
thin archives as shown at: https://unix.stackexchange.com/questions/5518/what-is-the-difference-between-the-following-kernel-makefile-terms-vmlinux-vml/482978#482978
The initial commit and rationale are at: a5967db9af51a84f5e181600954714a9e4c69f1f (included in v4.9
) whose commit message says:
ld -r is an incremental link used to create built-in.o files in build
subdirectories. It produces relocatable object files containing all
its input files, and these are are then pulled together and relocated
in the final link. Aside from the bloat, this constrains the final
link relocations, which has bitten large powerpc builds with
unresolvable relocations in the final link.
this is also mentioned at Documentation/process/changes.rst:
Binutils
--------
The build system has, as of 4.13, switched to using thin archives (`ar T`)
rather than incremental linking (`ld -r`) for built-in.a intermediate steps.
This requires binutils 2.20 or newer.
TODO: find out when incremental linking was introduced, and see if there is a minimal test case that we can use to see it going faster: https://unix.stackexchange.com/questions/491312/why-does-the-linux-kernel-build-system-use-incremental-linking-or-ar-t-thin-arch
Tested on Ubuntu 18.10, GCC 8.2.0, Lenovo ThinkPad P51 laptop, Intel Core i7-7820HQ CPU (4 cores / 8 threads), 2x Samsung M471A2K43BB1-CRC RAM (2x 16GiB), Samsung MZVLB512HAJQ-000L7 SSD (3,000 MB/s).
ld
creates executables and shared libraries, not object file archives (.a files).
ar
creates and modifies object file archives.
-r, --relocateable
option is useful when you would like to resolve certain (unresolved) symbols of a .so
and produce another .so
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With