In c++, what is a good heuristic for estimating the compute time benefits of inlining a function, particularly when the function is called very frequently and accounts for >= 10% of the program's execution time (eg. the evaluation function of a brute force or stochastic optimization process). Even though inlining may be ultimately beyond my control, I am still curious.
There is no general answer. It depends on the hardware, the number and type of its arguments, and what is done in the function. And how often it is called, and where. On a Sparc, for example, arguments (and the return value) are passed in registers, and each function gets 16 new registers: if the function is complex enough, those new registers may avoid spilling that would occur if the function were inlined, and the non-inline version may end up faster than the inlined one. On an Intel, which is register poor, and passes arguments in registers, just the opposite might be true, for the same function in the same program. More generally, inlining may increase program size, reducing locality. Or for very simple functions, it may reduce program size; but that again depends on the architecture. The only possible way to know is to try both, measuring the time. And even then you'll only know for that particular program, on that particular hardware.
A function call and return on some architectures take as few as one instruction each (although they're generally not RISC-like single-cycle instructions.) In general, you can compare that to the number of cycles represented by the body of the function. A simple property access might be only a single instruction, and so putting it into a non-inlined function will triple the number of instructions to execute it -- obviously a great candidate for inlining. On the other hand, a function that formats a string for printing might represent hundreds of instructions, so two more isn't going to make any difference at all.
If your bottleneck is in a recursive function, and assuming that the level of recursion is not minimal (i.e. average recursion is not just a few levels), you are better off in working with the algorithm in the function rather than with inlining.
Try, if possible, to transform the recursion into a loop or tail-recursion (that can be implicitly transformed into a loop by the compiler), or try to determine where in the function the cost is being spent. Try to minimize the impact of the internal operations (maybe you are dynamically allocating memory that could have auto storage duration, or maybe you can factor a common operation to be performed external to the function in a wrapper and passed in as an extra argument,...)
*EDIT after the comment that recursion was not intended, but rather iteration*
If the compiler has access to the definition of the function, it will make the right decision for you in most cases. If it does not have access to the definition, just move the code around so that it does see it. Maybe make the function a static
function to provide an extra hint that it won't be used anywhere else, or even mark it as inline
(knowing that this will not force inlining), but avoid using special attributes that will force inlining, as the compiler probably does it better than any simple heuristic that can be produced without looking at the code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With