Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are the optimizations done in LTO the same as in normal compilation?

While compiling a translation unit the compiler is doing a lot of optimizations - inlining, constant folding/propagation, alias analysis, loop unrolling, dead code elimination and many others I haven't even heard of. Are all of them done when using LTO/LTCG/WPO between multiple translation units or is just a subset (or a variant) of them done (I've heard about inlining)? If not all optimizations are done I would consider unity builds superior to LTO (or maybe using them both when there are more than 1 unity source files).

My guess is that it's not the same (unity builds having the full set of optimizations) and also that it varies a lot across compilers.

The documentation on lto of each compiler doesn't precisely answer this (or I am failing at understanding it).

Since lto involves saving the intermediate representation in the object files in theory LTO could do all the optimizations... right?

Note that I am not asking about build speed - that is a separate issue.

EDIT: I am mostly interested in gcc/llvm.

like image 255
onqtam Avatar asked Jul 30 '15 10:07

onqtam


People also ask

Does LTO improve performance?

Conclusions. LTO provides a performance boost for all the compilers.

What optimizations are performed when LTO is enabled?

Link Time Optimization (LTO) refers to program optimization during linking. The linker pulls all object files together and combines them into one program. The linker can see the whole of the program, and can therefore do whole-program analysis and optimization.

Does GCC optimize by default?

GCC has a range of optimization levels, plus individual options to enable or disable particular optimizations. The overall compiler optimization level is controlled by the command line option -On, where n is the required optimization level, as follows: -O0 . (default).

What optimization does GCC do?

The compiler optimizes to reduce the size of the binary instead of execution speed. If you do not specify an optimization option, gcc attempts to reduce the compilation time and to make debugging always yield the result expected from reading the source code.


1 Answers

If you have a look at the gcc documentation you find:

-flto[=n]

This option runs the standard link-time optimizer. When invoked with source code, it generates GIMPLE (one of GCC's internal representations) and writes it to special ELF sections in the object file. When the object files are linked together, all the function bodies are read from these ELF sections and instantiated as if they had been part of the same translation unit.

To use the link-time optimizer, -flto and optimization options should be specified at compile time and during the final link. For example:

          gcc -c -O2 -flto foo.c
          gcc -c -O2 -flto bar.c
          gcc -o myprog -flto -O2 foo.o bar.o

The first two invocations to GCC save a bytecode representation of GIMPLE into special ELF sections inside foo.o and bar.o. The final invocation reads the GIMPLE bytecode from foo.o and bar.o, merges the two files into a single internal image, and compiles the result as usual. Since both foo.o and bar.o are merged into a single image, this causes all the interprocedural analyses and optimizations in GCC to work across the two files as if they were a single one. This means, for example, that the inliner is able to inline functions in bar.o into functions in foo.o and vice-versa.

As the documentation tells, yes, all! optimizations are as the program is compiled in a single file. This also can be done with -fwhole-program to get the "same" optimization result.

If you compile this very simple example:

f1.cpp:

int f1() { return 10; }

f2.cpp:

int f2(int i) { return 2*i; }

main.cpp:

int main()
{   
    int res=f1();
    res=f2(res);
    res++;

    return res;
} 

I got as assembler output:

00000000004005e0 <main>:
  4005e0:   b8 15 00 00 00          mov    $0x15,%eax
  4005e5:   c3                      retq   
  4005e6:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  4005ed:   00 00 00

All code is inlined as expected.

My experience is, that the actual gcc optimizes with lto exactly as compiled in a single file. On very rare conditions I got ICE while using lto. But with actual 5.2.0 version I have not seen any ICE again.

[ICE]-> Internal Compiler Error

like image 116
Klaus Avatar answered Oct 12 '22 06:10

Klaus