I've lately encountered a lot of functions where gcc generates really bad code on x86. They all fit a pattern of:
if (some_condition) {
/* do something really simple and return */
} else {
/* something complex that needs lots of registers */
}
Think of simple case as something so small that half or more of the work is spent pushing and popping registers that won't be modified at all. If I were writing the asm by hand, I would save and restore the saved-across-calls registers inside the complex case, and avoid touching the stack pointer at all in the simple case.
Is there any way to get gcc to be a little bit smarter and do this itself? Preferably with command line options rather than ugly hacks in the source...
Edit: To make it concrete, here's something very close to some of the functions I'm dealing with:
if (buf->pos < buf->end) {
return *buf->pos++;
} else {
/* fill buffer */
}
and another one:
if (!initialized) {
/* complex initialization procedure */
}
return &initialized_object;
and another:
if (mutex->type == SIMPLE) {
return atomic_swap(&mutex->lock, 1);
} else {
/* deal with ownership, etc. */
}
Edit 2: I should have mentioned to begin with: these functions cannot be inlined. They have external linkage and they're library code. Allowing them to be inlined in the application would result in all kinds of problems.
Update
To explicitely suppress inlining for a single function in gcc, use:
void foo() __attribute__ ((noinline))
{
...
}
See also How can I tell gcc not to inline a function?
Functions like this will regularly be inlined automatically unless compiled -O0 (disable optimization).
In C++ you can hint the compiler using the inline keyword
If the compiler won't take your hint you are probably using too many registers/branches inside the function. The situation is almost certainly resolved by extracting the 'complicated' block into it's own function.
Update i noticed you added the fact that they are extern symbols. (Please update the question with that crucial info). Well, in a sense, with external functions, all bets are off. I cannot really believe that gcc will by definition inline all of a complex function into a tiny caller simply because it is only called from there. Perhaps you can give some sample code that demonstrates the behaviour and we can find the proper optimization flags to remedy that?
Also, is this C or C++? In C++ I know it is common place to include the trivial decision functions inline (mostly as members defined in the class declaration). This won't give a linkage conflict like with simple (extern) C functions.
Also you can have template functions defined that will inline perfectly in all compilation modules without resulting in link conflicts.
I hope you are using C++ because it will give you a ton of options here.
I would do it like this:
static void complex_function() {}
void foo()
{
if(simple_case) {
// do whatever
return;
} else {
complex_function();
}
}
The compiler my insist on inlining complex_function(), in which case you can use the noinline attribute on it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With