Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How well do linkers cope with functions that return quickly?

In C, if I have a function call that looks like

// main.c
...
do_work_on_object(object, arg1, arg2);
...

// object.c
void do_work_on_object(struct object_t *object, int arg1, int arg2)
{
  if(object == NULL)
  {
    return;
  }
  // do lots of work
}

then the compiler will generate a lot of stuff in main.o to save state, pass parameters (hopefully in registers in this case), and restore state.

However, at link time it can be observed that arg1 and arg2 are not used in the quick-return path, so the clean-up and state restoration can be short-circuited. Do linkers tend to do this kind of thing automatically, or would one need to turn on link-time optimization (LTO) to get that kind of thing to work?

(Yes, I could inspect the disassembled code, but I'm interested in the behaviours of compilers and linkers in general, and on multiple architectures, so hoping to learn from others' experience.)

Assuming that profiling shows this function call is worth optimizing, should we expect the following code to be noticeably faster (e.g. without the need to use LTO)?

// main.c
...
if(object != NULL)
{
  do_work_on_object(object, arg1, arg2);
}
...

// object.c
void do_work_on_object(struct object_t *object, int arg1, int arg2)
{
  assert(object != NULL) // generates no code in release build
  // do lots of work
}
like image 635
mabraham Avatar asked Apr 17 '15 17:04

mabraham


1 Answers

Some compilers (like GCC and clang) are able to do "shrink-wrap" optimization to delay saving call-preserved regs until after a possible early-out, if they're able to spot the pattern. But some don't, e.g. apparently MSVC 16.11 still doesn't.

I don't think any do partial inlining of just the early-out check into the caller, to avoid even the overhead of arg-passing and the call / ret itself.


Since compiler/linker support for this is not universal and not always successful even for shrink-wrapping, you can write your code in a way that gets much of the benefit, at the cost of splitting the logic of your function into two places.

If you have a fast-path that takes hardly any code, but happens often enough to matter, put that part in a header so it gets inlined, with a fallback to calling the rest of the function (which you make private, so it can assume that any checks in the inlined part are already done).

e.g. par2's routine that processes a block of data has a fast-path for when the galois16 factor is zero. (dst[i] += 0 * src[i] is a no-op, even when * is a multiply in Galois16, and += is a GF16 add (i.e. a bitwise XOR)).

Note how the commit in question renames the old function to InternalProcess, and adds a new template<class g> inline bool ReedSolomon<g>::Process that checks for the fast-path, and otherwise calls InternalProcess. (as well as making a bunch of unrelated whitespace changes, and some ifdefs... It was originally a 2006 CVS commit.)

The comment in the commit claims an overall 8% speed gain for repairing.

like image 171
Peter Cordes Avatar answered Sep 24 '22 18:09

Peter Cordes