Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Suggest to the compiler to selectively inline function calls

Tags:

c++

inline

Suppose I have the following code:

struct Foo {
  void helper() { ... }
  void fast_path() { ...; helper(); ... }
  void slow_path1() { ...; helper(); ... }
  void slow_path2() { ...; helper(); ... }
};

The method fast_path() is performance critical and so every (reasonable) effort should be made to make it as fast as possible. The methods slow_path1() and slow_path2() are not performance critical.

From my understanding, a typical compiler might look at this code and decide not to inline helper() if it is complex enough, in order to reduce total instruction size, as helper() is shared between multiple methods functions. That same compiler might inline helper() if the slow-path methods did not exist.

Given our desired performance characteristics, we want the compiler to inline the call to helper() inside fast_path(), but prefer the compiler's default behavior in slow_path1() and slow_path2().

One workaround is to have the slow-path function definitions and the call to fast_path() live in separate compilation units, so that the compiler never sees a usage of helper() shared with fast_path(). But maintaining this separation requires special care and cannot be enforced through the compiler. Plus, the proliferation of files (Foo.h, FooINLINES.cpp, and now also Foo.cpp) is undesirable, and the additional compilation units complicates the build of what perhaps could have been a header-only library.

Is there a better way?

Ideally I'd want a new do_not_inline_function_calls_inside_me c++ keyword that I could use like this:

  do_not_inline_function_calls_inside_me void slow_path1() { ... }
  do_not_inline_function_calls_inside_me void slow_path2() { ... }

Alternatively, a inline_function_calls_inside_me keyword, like this:

  inline_function_calls_inside_me void fast_path() { ... }

Note that these hypothetical keywords decorate the *_path*() methods, not the helper() method.

An example context where you might have these sorts of performance demands is a programming competition where each participant writes an application that listens to sparse global data broadcasts of types A and B. When type-B broadcasts are received, each application must perform a computation that depends on the sequence of previously broadcasted type-A messages, and submit the computation result to a central server. The first correct responder to each type-B broadcast scores a point. The nature of the computational problem might allow for precomputation to be performed on the type-A updates; there is no advantage to doing those quickly.

like image 211
dshin Avatar asked Jan 05 '16 01:01

dshin


1 Answers

Generally speaking, you should not try to be smarter than the compiler. Modern compilers do an awesome job at deciding how to inline functions and humans are notoriously bad at reasoning about this.

In my experience, the best you can do is to have all relevant functions there as inline functions in the same translation unit so the compiler can see their definition and can inline them as it sees fit. Levae the final decision whether to inline a given function to the compiler, however, and use “forced inline” very sparingly, unless you have evidence that it has a beneficial effect in a given situation.

To make the compiler's job easier, you can provide it with additional information about your program. In GCC and Clang, you can use function attributes for this.

struct Foo {
  void helper();
  void fast_path()  __attribute__ ((hot));
  void slow_path1() __attribute__ ((cold));
  void slow_path2() __attribute__ ((cold));
};

inline void Foo::helper()     { … }
inline void Foo::fast_path()  { … }
inline void Foo::slow_path1() { … }
inline void Foo::slow_path2() { … }

This will hint the compiler to optimize Foo::fast_pathmore aggressively for speed and Foo::slow_path1 and Foo::slow_path2 for small cache footprint. If either of these functions calls Foo::helper, it can decide on a case-by-case basis whether to inline it or not. (See the documentation in the linked manual for the precise effect of the annotations.)

An even better way to hint the compiler is to give it actual profiling data. With GCC, you can compile your program with the -fprofile-generate option. This will instrument your binary with code that collects profile statistics. Now run your program with a representative set of inputs. Doing so will create a *.gcda file with the collected profile data. Now re-compile with the -fprofile-use option. GCC will use the collected profile information to decide what paths in your code are hot and how they interact with each other. This technique is known as profile guided optimization (PGO).

Of course, if you're worried about such things, first make sure that you enable appropriate optimization levels (-O2). Especially template-heavy C+ code (ie, almost everything that uses the standard library or Boost) can generate really ugly machine code when compiled without decent optimization. Also think whether you want to compile assertions into your code (-DNDEBUG).

like image 186
5gon12eder Avatar answered Jan 02 '23 11:01

5gon12eder