An LTO build of a rather large shared library (many template instantiations) takes rather long (>10min). Now I know a few things about the library, and could specify some kind of "blacklist" in the form of object files that do not need to be analyzed together (because there are no calls among them that should be inlined or so), or I could specify groups of object files that should be analyzed together. Is this possible somehow (without splitting up the lib)?
There is a little-used feature of ld
called -r
/--relocatable
that can be used to combine multiple object files into one, that can later be linked into the final product. If one can get LTO to happen here, but not later, you can have the kind of "partial" LTO you're looking for.
Sadly ld -r
won't work; it just combines all the LTO information to be processed later. But invoking it via the gcc driver (gcc -r
) seems to work:
a.c
int a() { return 42; }
b.c
int a(void); int b() { return a(); }
c.c
int b(void); int c() { return b(); }
d.c
int c(void); int main() { return c(); }
$ gcc -O3 -flto -c [a-d].c
$ gcc -O3 -r -nostdlib a.o b.o -o g1.o
$ gcc -O3 -r -nostdlib c.o d.o -o g2.o
$ gcc -O3 -fno-lto g1.o g2.o
$ objdump -d a.out
...
00000000000004f0 <main>:
4f0: e9 1b 01 00 00 jmpq 610 <b>
...
0000000000000610 <b>:
610: b8 2a 00 00 00 mov $0x2a,%eax
615: c3 retq
...
So main()
got optimized to return b();
, and b()
got optimized to return 42;
, but there were no interprocedural optimizations between the two groups.
Assume that you want to optimize a.c
and b.c
together as one group and c.c
and d.c
as another group. You can use the -combine
GCC switch as follows:
$ gcc -O3 -c -combine a.c b.c -o group1.o
$ gcc -O3 -c -combine c.c d.c -o group2.o
Note that you don't need to use LTO because the -combine
switch combines multiple source code files before optimizing the code.
Edit
-combine
currently is only supported for C code. An alternative way to achieve this would be using the #include
directive as follows:
// file group1.cpp
#include "a.cpp"
#include "b.cpp"
// file group2.cpp
#include "c.cpp"
#include "d.cpp"
Then they can be compiled without using LTO as follows:
g++ -O3 group1.cpp group2.cpp
This effectively emulates grouped or partial LTO.
However, it's not clear whether this technique or the one proposed in another answer is faster to compile. Also the code may not be optimized in the same exact way. So the performance of the resulting code using each technique should be compared. Then the preferred technique can be used.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With