gcc LTO: Limit scope of optimization

Question

An LTO build of a rather large shared library (many template instantiations) takes rather long (>10min). Now I know a few things about the library, and could specify some kind of "blacklist" in the form of object files that do not need to be analyzed together (because there are no calls among them that should be inlined or so), or I could specify groups of object files that should be analyzed together. Is this possible somehow (without splitting up the lib)?

Tavian Barnes · Accepted Answer

There is a little-used feature of ld called -r/--relocatable that can be used to combine multiple object files into one, that can later be linked into the final product. If one can get LTO to happen here, but not later, you can have the kind of "partial" LTO you're looking for.

Sadly ld -r won't work; it just combines all the LTO information to be processed later. But invoking it via the gcc driver (gcc -r) seems to work:

a.c

int a() {
    return 42;
}

b.c

int a(void);

int b() {
    return a();
}

c.c

int b(void);

int c() {
    return b();
}

d.c

int c(void);

int main() {
    return c();
}

$ gcc -O3 -flto -c [a-d].c
$ gcc -O3 -r -nostdlib a.o b.o -o g1.o
$ gcc -O3 -r -nostdlib c.o d.o -o g2.o
$ gcc -O3 -fno-lto g1.o g2.o
$ objdump -d a.out
...
00000000000004f0 <main>:
 4f0:   e9 1b 01 00 00          jmpq   610 <b>
...
0000000000000610 <b>:
 610:   b8 2a 00 00 00          mov    $0x2a,%eax
 615:   c3                      retq   
...

So main() got optimized to return b();, and b() got optimized to return 42;, but there were no interprocedural optimizations between the two groups.

Hadi Brais · Answer

Assume that you want to optimize a.c and b.c together as one group and c.c and d.c as another group. You can use the -combine GCC switch as follows:

$ gcc -O3 -c -combine a.c b.c -o group1.o
$ gcc -O3 -c -combine c.c d.c -o group2.o

Note that you don't need to use LTO because the -combine switch combines multiple source code files before optimizing the code.

Edit

-combine currently is only supported for C code. An alternative way to achieve this would be using the #include directive as follows:

// file group1.cpp
#include "a.cpp"
#include "b.cpp"

// file group2.cpp
#include "c.cpp"
#include "d.cpp"

Then they can be compiled without using LTO as follows:

g++ -O3 group1.cpp group2.cpp

This effectively emulates grouped or partial LTO.

However, it's not clear whether this technique or the one proposed in another answer is faster to compile. Also the code may not be optimized in the same exact way. So the performance of the resulting code using each technique should be compared. Then the preferred technique can be used.

gcc LTO: Limit scope of optimization

Tags:

c++

gcc

lto

Martin Richtarsky

2 Answers

Tavian Barnes

Hadi Brais

Recent Activity

Donate For Us

gcc LTO: Limit scope of optimization

Tags:

c++

gcc

lto

Martin Richtarsky

2 Answers

Tavian Barnes

Hadi Brais

Related questions

Recent Activity

Donate For Us