Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can the linker inline functions?

In the file file1.c, there is a call to a function that is implemented in the file file2.c. When I link file1.o and file2.o into an executable, if the function in file2 is very small, will the linker automatically detect that the function is small and inline its call?

like image 879
Squall Avatar asked May 13 '11 03:05

Squall


People also ask

Which function Cannot inline?

The only situation in which a function cannot be inlined is if there is no definition for the function in the compilation unit. Even that will not prevent link-time inlining by a link-time optimizer.

Does C++ automatically inline?

Mainstream C++ compilers like Microsoft Visual C++ and GCC support an option that lets the compilers automatically inline any suitable function, even those not marked as inline functions.

Can inline functions extern?

Similarly, if you define a function as extern inline , or redeclare an inline function as extern , the function simply becomes a regular, external function and is not inlined. End of C only. Beginning of C++ only.

Can static function be inline?

Any function, with the exception of main , can be declared or defined as inline with the inline function specifier. Static local variables are not allowed to be defined within the body of an inline function. C++ functions implemented inside of a class declaration are automatically defined inline.


1 Answers

In addition to the support for Link Time Code Generation (LTCG) that Jame McNellis mentioned, the GCC toolchain also supports link time optimization. Starting with version 4.5, GCC supports the -flto switch which enables Link Time Optimization (LTO), a form of whole program optimization that lets it inline functions from separate object files (and whatever other optimizations a compiler might be able to make if it were compiling all the object files as if they were from a single C source file).

Here's a simple example:

test.c:

void print_int(int x);  int main(){     print_int(1);     print_int(42);     print_int(-1);      return 0; } 

print_int.c:

#include <stdio.h>  void print_int( int x) {     printf( "the int is %d\n", x); } 

First compile them using GCC4.5.x - examples from GCC docs use -O2, but to get visible results in my simple test, I had to use -O3:

C:\temp>gcc --version gcc (GCC) 4.5.2  # compile with preparation for LTO C:\temp>gcc -c -O3 -flto test.c C:\temp>gcc -c -O3 -flto print_int.c  # link without LTO C:\temp>gcc -o test-nolto.exe  print_int.o test.o 

To get the effect of LTO you're supposed to use the optimization options even at the link stage - the linker actually invokes the compiler to compile pieces of intermediate code that the compiler put into the object file in the first steps above. If you don't pass the optimization option at this stage as well, the compiler won't perform the inlining that you'd be looking for.

# link using LTO C:\temp>gcc -o test-lto.exe -flto -O3 print_int.o test.o 

Disassembly of the version without link time optimization. Note that the calls are made to the print_int() function:

C:\temp>gdb test-nolto.exe GNU gdb (GDB) 7.2 (gdb) start Temporary breakpoint 1 at 0x401373 Starting program: C:\temp/test-nolto.exe [New Thread 3324.0xdc0]  Temporary breakpoint 1, 0x00401373 in main () (gdb) disassem Dump of assembler code for function main:    0x00401370 <+0>:     push   %ebp    0x00401371 <+1>:     mov    %esp,%ebp => 0x00401373 <+3>:     and    $0xfffffff0,%esp    0x00401376 <+6>:     sub    $0x10,%esp    0x00401379 <+9>:     call   0x4018ca <__main>    0x0040137e <+14>:    movl   $0x1,(%esp)    0x00401385 <+21>:    call   0x401350 <print_int>    0x0040138a <+26>:    movl   $0x2a,(%esp)    0x00401391 <+33>:    call   0x401350 <print_int>    0x00401396 <+38>:    movl   $0xffffffff,(%esp)    0x0040139d <+45>:    call   0x401350 <print_int>    0x004013a2 <+50>:    xor    %eax,%eax    0x004013a4 <+52>:    leave    0x004013a5 <+53>:    ret 

Disassembly of the version with link time optimization. Note that the calls to printf() are made directly:

C:\temp>gdb test-lto.exe  GNU gdb (GDB) 7.2 (gdb) start Temporary breakpoint 1 at 0x401373 Starting program: C:\temp/test-lto.exe [New Thread 1768.0x126c]  Temporary breakpoint 1, 0x00401373 in main () (gdb) disassem Dump of assembler code for function main:    0x00401370 <+0>:     push   %ebp    0x00401371 <+1>:     mov    %esp,%ebp => 0x00401373 <+3>:     and    $0xfffffff0,%esp    0x00401376 <+6>:     sub    $0x10,%esp    0x00401379 <+9>:     call   0x4018da <__main>    0x0040137e <+14>:    movl   $0x1,0x4(%esp)    0x00401386 <+22>:    movl   $0x403064,(%esp)    0x0040138d <+29>:    call   0x401acc <printf>    0x00401392 <+34>:    movl   $0x2a,0x4(%esp)    0x0040139a <+42>:    movl   $0x403064,(%esp)    0x004013a1 <+49>:    call   0x401acc <printf>    0x004013a6 <+54>:    movl   $0xffffffff,0x4(%esp)    0x004013ae <+62>:    movl   $0x403064,(%esp)    0x004013b5 <+69>:    call   0x401acc <printf>    0x004013ba <+74>:    xor    %eax,%eax    0x004013bc <+76>:    leave    0x004013bd <+77>:    ret End of assembler dump. 

And here's the same experiment with MSVC (first with LTCG):

C:\temp>cl -c /GL /Zi /Ox test.c Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86 Copyright (C) Microsoft Corporation.  All rights reserved.  test.c  C:\temp>cl -c /GL /Zi /Ox print_int.c Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86 Copyright (C) Microsoft Corporation.  All rights reserved.  print_int.c  C:\temp>link /LTCG test.obj print_int.obj /out:test-ltcg.exe /debug Microsoft (R) Incremental Linker Version 10.00.40219.01 Copyright (C) Microsoft Corporation.  All rights reserved.  Generating code Finished generating code  C:\temp>"\Program Files (x86)\Debugging Tools for Windows (x86)"\cdb test-ltcg.exe  Microsoft (R) Windows Debugger Version 6.12.0002.633 X86 Copyright (c) Microsoft Corporation. All rights reserved.  CommandLine: test-ltcg.exe     // ... 0:000> u main *** WARNING: Unable to verify checksum for test-ltcg.exe test_ltcg!main: 00cd1c20 6a01            push    1 00cd1c22 68d05dcd00      push    offset test_ltcg!__decimal_point_length+0x10 (00cd5dd0) 00cd1c27 e8e3f3feff      call    test_ltcg!printf (00cc100f) 00cd1c2c 6a2a            push    2Ah 00cd1c2e 68d05dcd00      push    offset test_ltcg!__decimal_point_length+0x10 (00cd5dd0) 00cd1c33 e8d7f3feff      call    test_ltcg!printf (00cc100f) 00cd1c38 6aff            push    0FFFFFFFFh 00cd1c3a 68d05dcd00      push    offset test_ltcg!__decimal_point_length+0x10 (00cd5dd0) 00cd1c3f e8cbf3feff      call    test_ltcg!printf (00cc100f) 00cd1c44 83c418          add     esp,18h 00cd1c47 33c0            xor     eax,eax 00cd1c49 c3              ret 0:000> 

Now without LTCG. Note that with MSVC you have to compile the .c file without the /GL to prevent the linker from performing LTCG - otherwise the linker detects that /GL was specified, and it'll force the /LTCG option (hey, that's what you said you wanted the first time around with /GL):

C:\temp>cl -c /Zi /Ox test.c Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86 Copyright (C) Microsoft Corporation.  All rights reserved.  test.c  C:\temp>cl -c /Zi /Ox print_int.c Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86 Copyright (C) Microsoft Corporation.  All rights reserved.  print_int.c  C:\temp>link test.obj print_int.obj /out:test-noltcg.exe /debug Microsoft (R) Incremental Linker Version 10.00.40219.01 Copyright (C) Microsoft Corporation.  All rights reserved.  C:\temp>"\Program Files (x86)\Debugging Tools for Windows (x86)"\cdb test-noltcg.exe  Microsoft (R) Windows Debugger Version 6.12.0002.633 X86 Copyright (c) Microsoft Corporation. All rights reserved.  CommandLine: test-noltcg.exe // ... 0:000> u main test_noltcg!main: 00c41020 6a01            push    1 00c41022 e8e3ffffff      call    test_noltcg!ILT+5(_print_int) (00c4100a) 00c41027 6a2a            push    2Ah 00c41029 e8dcffffff      call    test_noltcg!ILT+5(_print_int) (00c4100a) 00c4102e 6aff            push    0FFFFFFFFh 00c41030 e8d5ffffff      call    test_noltcg!ILT+5(_print_int) (00c4100a) 00c41035 83c40c          add     esp,0Ch 00c41038 33c0            xor     eax,eax 00c4103a c3              ret 0:000> 

One thing that Microsoft's linker supports in LTCG that is not supported by GCC (as far as I know) is Profile Guided Optimization (PGO). That technology allows Microsoft's linker to optimize based on a profiling data gathered from previous runs of the program. This allows the linker to do things such as gather 'hot' functions onto the same memory pages and seldom used code sequences onto other memory pages to reduce the working set of a program.

 


Edit (28 Aug 2011): GCC support profile guided optimization using such options as -fprofile-generate and -fprofile-use, but I'm completely uninformed about them.

Thanks to Konrad Rudolph for pointing this out to me.

like image 94
Michael Burr Avatar answered Sep 17 '22 07:09

Michael Burr