What is Partial Linking in GNU Linker?

Question

The best explanation I was able to find was from the official document:

-r --relocateable Generate relocatable output--i.e., generate an output file that can in turn serve as input to ld. This is often called partial linking. As a side effect, in environments that support standard Unix magic numbers, this option also sets the output file's magic number to OMAGIC. If this option is not specified, an absolute file is produced. When linking C++ programs, this option will not resolve references to constructors; to do that, use -Ur. This option does the same thing as `-i'.

I am specifically interested in knowing what happens to the symbols present in inputs to linker. Take a specific case when I have a static library libstatic.a which contains a single object file component.o. Now, I want to create another static library libfinal.a which will work as an interface to libstatic.a. I use this command to create it:

ld -r -o libfinal.a wrapper.o -L. -lstatic

Where wrapper.o provides exclusive APIs to call the functions defined in libstatic.a

Will the libfinal.a be just a combined archive having wrapper.o and component.o or all the references which can-be-resolved between wrapper.o and component.o be resolved(linking) and then placed into libfinal.a?

Edit_1: Updating the question based on the progress made: The objdump of the component library libstatic.a (objdump -D libstatic.a) shows .text sections separately for each function (as expected). Whereas in the combined library libfinal.a, which has been created by partial linking (-rflag) there is just one single .text section. I guess this means that an internal-linking has taken place and it's not just creating a plain archive.

Ciro Santilli 新疆再教育营六四事件法轮功郝海东 · Accepted Answer

Minimal runnable example

Here I produce a minimal example and compile it in two ways to produce functionally identical executables:

one combined f12.c file without partial linking linking into f12.o
two separate f1.c and f2.c which are first partially linked into f12_r.o

main.c

#include <assert.h>
#include <stdlib.h>

int f_1_2(void);
int f_2_1(void);

int main(void) {
    assert(f_1_2() + f_2_1() == 5);
    return EXIT_SUCCESS;
}

f1.c

#include "f1.h"

f2.c

#include "f2.h"

f12.c

#include "f1.h"
#include "f2.h"

f1.h

int f_2(void);

int f_1_2(void) {
    return f_2() + 1;
}

int f_1(void) {
    return 1;
}

f2.h

int f_1(void);

int f_2_1(void) {
    return f_1() + 1;
}

int f_2(void) {
    return 2;
}

run.sh

#!/usr/bin/env bash
set -eux
cflags='-ggdb3 -std=c99 -O0 -fPIE -pie'
gcc $cflags -c -o f1.o f1.c
gcc $cflags -c -o f2.o f2.c
gcc $cflags -c -o f12.o f12.c
ld -o f12_r.o -r f1.o f2.o
gcc $cflags -c -o main.o main.c
gcc $cflags -o main.out f12.o main.o
gcc $cflags -o main_r.out f12_r.o main.o
./main.out
./main_r.out

GitHub upstream.

If we try the same thing but without ld -r, then we get the final warnings:

+ ld -o f12_r.o f1.o f2.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000000401000
+ gcc -ggdb3 -std=c99 -O0 -fPIE -pie -o main_r.out f12_r.o main.o
/usr/bin/ld: error in f12_r.o(.eh_frame); no .eh_frame_hdr table will be created

none of them makes makes the tool exit non-0, and the final executable still runs, so I'm not sure how bad it is. TODO understand.

Binary analysis

If you are not familiar with relocation, first read this: What do linkers do?

The key question is how could partial linking speed up the link. The only thing I could think of was by resolving references across pre-linked files. I've focused on this for now.

However, it does not do that as asked at: Resolve relative relocations in partial link so I would expect it not to speed up link significantly.

I have confirmed this with:

objdump -S f12.o
objdump -S f12_r.o

both of which produce identical outputs that contain:

int f_1_2(void) {
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
    return f_2() + 1;
   4:   e8 00 00 00 00          callq  9 <f_1_2+0x9>
   9:   83 c0 01                add    $0x1,%eax
}
   c:   5d                      pop    %rbp
   d:   c3                      retq

so we see that the call to f_1_2 has not yet been resolved in either case because the relative offset address is still 0: e8 00 00 00 00 (e8 is the opcode).

This also taught me that GCC does not resolve function calls before the final link either TODO rationale, possible to force it to resolve?

Benchmark

I had benchmarked LD vs GOLD at: Replacing ld with gold - any experience? so I decided to reuse it to see if partial linking leads to any link speedup.

I generated the test objects with this script:

./generate-objects 100 1000 100

and then I started with the most extreme link case possible: pre-link everything except the main file, and then benchmark the final link:

mv main.o ..
ld -o partial.o -r *.o
time gcc               partial.o ../main.o
time gcc -fuse-ld=gold partial.o ../main.o

The wall clock time results in seconds were as follows:

          No partial link   Partial link
No Gold   6.15              5.756
Gold      4.06              4.457

Therefore:

the time difference exists, but is not very significant
without gold it went faster, but with GOLD it became slower!

Therefore, based on this experiment, it seems that partial linking may not speed up your link time, at all, and I'd just recommend you to try GOLD instead to start with.

Let me know if you can produce a concrete example where incremental linking leads to significant speedup.

Case study: the Linux kernel

The Linux kernel is one example of a large project that used to use incremental linking, so maybe we can learn something from it.

It has since moved to ar T thin archives as shown at: https://unix.stackexchange.com/questions/5518/what-is-the-difference-between-the-following-kernel-makefile-terms-vmlinux-vml/482978#482978

The initial commit and rationale are at: a5967db9af51a84f5e181600954714a9e4c69f1f (included in v4.9) whose commit message says:

ld -r is an incremental link used to create built-in.o files in build
subdirectories. It produces relocatable object files containing all
its input files, and these are are then pulled together and relocated
in the final link. Aside from the bloat, this constrains the final
link relocations, which has bitten large powerpc builds with
unresolvable relocations in the final link.

this is also mentioned at Documentation/process/changes.rst:

Binutils
--------

The build system has, as of 4.13, switched to using thin archives (`ar T`)
rather than incremental linking (`ld -r`) for built-in.a intermediate steps.
This requires binutils 2.20 or newer.

TODO: find out when incremental linking was introduced, and see if there is a minimal test case that we can use to see it going faster: https://unix.stackexchange.com/questions/491312/why-does-the-linux-kernel-build-system-use-incremental-linking-or-ar-t-thin-arch

Tested on Ubuntu 18.10, GCC 8.2.0, Lenovo ThinkPad P51 laptop, Intel Core i7-7820HQ CPU (4 cores / 8 threads), 2x Samsung M471A2K43BB1-CRC RAM (2x 16GiB), Samsung MZVLB512HAJQ-000L7 SSD (3,000 MB/s).

Maxim Egorushkin · Answer

ld creates executables and shared libraries, not object file archives (.a files).

ar creates and modifies object file archives.

-r, --relocateable option is useful when you would like to resolve certain (unresolved) symbols of a .so and produce another .so.

What is Partial Linking in GNU Linker?

Tags:

c

linker

MediocreMyna

2 Answers

Ciro Santilli 新疆再教育营六四事件法轮功郝海东

Maxim Egorushkin

Recent Activity

Donate For Us

What is Partial Linking in GNU Linker?

Tags:

c

linker

MediocreMyna

2 Answers

Ciro Santilli 新疆再教育营六四事件法轮功郝海东

Maxim Egorushkin

Related questions

Recent Activity

Donate For Us