In Stroustrup's The C++ programming language , Page 431, when he was discussing about the design of the standard libraries, he said,
For example, building the comparison criteria into a sort function is unacceptable because the same data can be sorted according to different criteria. This is why the C standard library
qsort()
takes a comparison function as an argument rather than relying on something fixed, say, the < operator. On the other hand, the overhead imposed by a function call for each comparison compromisesqsort()
as a building block for further library building.
These above make sense to me. But in the second paragraph, he said,
Is that overhead serious? In most cases, probably not. However, the function call overhead can dominate the execution time for some algorithms and cause users to seek alternatives. The technique of supplying comparison criteria through a template argument described in §13.4 solves that problem.
In §13.4, the comparison criteria are defined as class with static member functions (which does the comparison). When these classes are used as template parameters, the comparison is still done by their static member functions. It seems to me there would still be overheads for calling the static member function.
What did Stroustrup mean by saying that?
std::sort
is a function template. A separate sort
instance will be created for each type and comparison operator during compilation. And because for each sort
instantiation the type and the comparator is known at compile time, this allows inlining the comparator function and therefore avoiding the cost of a function call.
There is no theoretical reason why sort
need be faster than qsort
. Some compilers will even inline the function pointer passed to functions 'like' qsort
: I believe I have seen gcc or clang do this (not to qsort
), and even do so where the function definition was in a different cpp file.
The important part is that sort
gets passed the function object as a type as well as an instance. A different function for each such type is generated: template
s are factories for functions. At the point where it is called, the exact function called is really easy to determine for each such function instance, so inlining is trivial.
Doing the same with a function pointer is possible, but requires inlining from the point where qsort
is invoked, and tracking the immutability of the function pointer carefully, and knowing which function it was to start with. This is far more fragile than the above mechanism in practice.
Similar issues appear with element stride (clearly static when sort
ing an array, harder to deal with when using qsort
) and the like.
Calling a function via a pointer has two overheads: the pointer dereferencing and a function call overhead. This is a runtime process.
Template instantiation is done by compiler. Pointer dereferencing is eliminated as there is no pointer obviously. Function call overhead is optimised out by a compiler by inlining the call.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With