Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any way a C/C++ compiler can inline a C-Callback Function?

Given a typical function that takes a C-Functionpointer as a callback like C-Stdlib qsort(), can any compiler optimize the code using inlining? I think it can not, is this correct?

int cmp(void* pa, void* pb) { /*...*/ }
int func() {
  int vec[1000];
  qsort(vec, 1000, sizeof(int), &cmp);
}

Ok, qsort() is a function from an external library, but I don't think even LTO would help here, right?

But what if I have my_qsort() defined in the same compilation unit, would then inlining be possible for the compiler?

int cmp(void* pa, void* pb) { /*...*/ }
void my_qsort(int* vec, int n, int sz, (void*)(void*,void*)) { /* ... */ }
int func() {
  int vec[1000];
  my_qsort(vec, 1000, sizeof(int), &cmp);
}

Does that make any difference? I think the use of the C-function pointer as a callback is the factor that prevents the compiler from inlining. Correct?

(I just want to make sure I understand why I should use Functors in C++)

like image 882
towi Avatar asked Feb 03 '23 20:02

towi


1 Answers

No, it's not possible, at least with the way a traditional tool-chain works. The traditional order of operations is that all compilation is done, then linking is done.

To generate your comparison function inline, the compiler would first have to generate the code for qsort itself inline (since each instance of qsort will usually use a different comparison function). In the case of something like qsort, however, it's typically been compiled and placed in the standard library before you ever even start to think about writing your code. When you compile your code, qsort is only available as an object file.

As such, to even stand a chance of doing anything like this, you need to build the inlining capability into the linker rather than the compiler. At least in theory that's possible, but it's decidedly non-trivial -- at least in my estimation, it's almost certainly more difficult than when working with source code. It also requires duplicating quite a bit of compiler-like functionality in the linker, and probably requires adding a fair amount of extra information into the object file to give the linker enough information to work with that it can even try to do the job.

Edit: perhaps I should go into more detail, lest the comment chain turn into a full-fledged argument over little more than wording.

Traditionally, a linker is fundamentally a fairly simple sort of beast. It starts from an object file that can be divided into four primary things:

  1. A collection of bits that are to be copied (unchanged, except as specifically directed) from the object file to the executable being produced.
  2. A list of symbols that the object file contains.
  3. A list of symbols used by not supplied by the object file.
  4. A list of fixups where addresses need to be written.

the linker then starts matching up the symbols exported in one file and used in another. It then looks in the object files in the library (or libraries) to resolve more symbols. Any time it adds in a file, it also adds its list of needed symbols, and searches recursively for other object files that can satisfy those.

When it has found object files that supply all the symbols, it copies the collection of bits part of each into the output file, and where the fixup records tell it to, it writes the relative addresses assigned to specific symbols (e.g., where you've called printf, it figures out where in the executable file it copied the bits that make up printf, and fills in your call with that address). In reasonably recent cases, rather than copying bits from the library it can embed a reference to a shared object/DLL into the executable, and leave it to the loader to actually find/load that file at run-time to supply the actual code for a symbol.

In particular, however, the linker is traditionally oblivious to the actual content of the blocks of bits it's copying. You can (for example) quite reasonably use exactly the same linker to deal with code for any of a number of different processors. As long as they all use the same object and executable file formats, it's fine.

Link time optimization does change that to at least some degree. Clearly, to optimize the code, we need some sort of extra intelligence that happens at what was traditionally considered link time. There are (at least) two ways to do that:

  1. build quite a bit of extra intelligence into the linker
  2. keep the intelligence in the compiler, and have the linker invoke it to do the optimization.

There are examples of both of those -- LLVM (for one obvious example) takes pretty much the former. The front-end compiler emits LLVM codes, and LLVM puts a lot of intelligence/work into translating that to an optimized executable. gcc with GIMPLE takes the latter route: the GIMPLE records basically give the linker enough information that it can feed the bits in a number of object files back to the compiler, have the compiler optimize them, and then feed the result back to the linker to actually copy into the executable.

I suppose you can probably come up with some sort of philosophical viewpoint that says these two are basically equivalent -- but I kind of doubt that anybody who'd implemented both would agree.

Now, it is true (probably, anyway) that either of those would suffice to implement the optimization at hand. Personally, I doubt that anybody implements this optimization for its own sake though. When you get down to it, qsort and bsearch are almost the only two reasonably common functions to which it would/will normally apply. For most practical purposes, that means you'd be implementing the optimization exclusively for the sake of qsort.

On the other hand, if the tools involved include the ability to produce inline functions and link time optimization, then I suppose there's at least a reasonable chance that you could end up with this particular type of optimization happening as a more or less accidental side-effect of the two coming together.

At least in theory, that means it could happen. There's one more wrinkle to take into account though: completely independent of the optimization at hand, many compilers will not generate inline code for a recursive function. To even attempt to, the compiler has to first convert the recursive function to an iterative form. That's fairly common in the case of tail recursion -- but Quick sort is not tail recursive. Nearly the only alternative is an implementation of qsort that isn't recursive to start with. That's certainly possible, but just as certainly rather unusual.

As such, even when/if the toolchain could support inline generation of a callback, it probably won't in the case of qsort (which, I'll admit, is the only case I've personally tested). For better or worse, however, qsort is nearly the only function of this sort that's common enough for it to matter much either.

like image 165
Jerry Coffin Avatar answered Feb 05 '23 18:02

Jerry Coffin