Do function pointers force an instruction pipeline to clear?

Tags:

Modern CPUs have extensive pipelining, that is, they are loading necessary instructions and data long before they actually execute the instruction.

Sometimes, the data loaded into the pipeline gets invalidated, and the pipeline must be cleared and reloaded with new data. The time it takes to refill the pipeline can be considerable, and cause a performance slowdown.

If I call a function pointer in C, is the pipeline smart enough to realize that the pointer in the pipeline is a function pointer, and that it should follow that pointer for the next instructions? Or will having a function pointer cause the pipeline to clear and reduce performance?

I'm working in C, but I imagine this is even more important in C++ where many function calls are through v-tables.

edit @JensGustedt writes:

To be a real performance hit for function calls, the function that you call must be extremely brief. If you observe this by measuring your code, you definitively should revisit your design to allow that call to be inlined

Unfortunately, that may be the trap that I fell into.

I wrote the target function small and fast for performance reasons.

But it is referenced by a function-pointer so that it can easily be replaced with other functions (Just make the pointer reference a different function!). Because I refer to it via a function-pointer, I don't think it can be inlined.

So, I have an extremely brief, not-inlined function.

874

asked May 25 '12 15:05

abelenky

1 Answers

On some processors an indirect branch will always clear at least part of the pipeline, because it will always mispredict. This is especially the case for in-order processors.

For example, I ran some timings on the processor we develop for, comparing the overhead of an inline function call, versus a direct function call, versus an indirect function call (virtual function or function pointer; they're identical on this platform).

I wrote a tiny function body and measured it in a tight loop of millions of calls, to determine the cost of just the call penalty. The "inline" function was a control group measuring just the cost of the function body (basically a single load op). The direct function measured the penalty of a correctly predicted branch (because it's a static target and the PPC's predictor can always get that right) and the function prologue. The indirect function measured the penalty of a bctrl indirect branch.

614,400,000 function calls:

inline:   411.924 ms  (   2 cycles/call ) direct:  3406.297 ms  ( ~17 cycles/call ) virtual: 8080.708 ms  ( ~39 cycles/call )

As you can see, the direct call costs 15 cycles more than the function body, and the virtual call (exactly equivalent to a function pointer call) costs 22 cycles more than the direct call. That happens to be approximately how many pipeline stages there are between the start of the pipline (instruction fetch) and the end of the branch ALU. Therefore on this architecture, an indirect branch (aka a virtual call) causes a clear of 22 pipeline stages 100% of the time.

Other architectures may vary. You should make these determinations from direct empirical measurements, or from the CPU's pipeline specifications, rather than assumptions about what processors "should" predict, because implementations are so different. In this case the pipeline clear occurs because there is no way for the branch predictor to know where the bctrl will go until it has retired. At best it could guess that it's to the same target as the last bctrl, and this particular CPU doesn't even try that guess.

answered Oct 03 '22 08:10

Crashworks

Related questions
                            
                                Why is argv parameter to execvp not const?
                            
                                CPU Flame Graphs for Python
                            
                                How to call exported kernel module functions from another module?
                            
                                Forcing GCC to perform loop unswitching of memcpy runtime size checks?
                            
                                Time zone conversion C API on Linux, anyone?
                            
                                How to capture the screen with the "Tool Tips"?
                            
                                Safer way to expose a C-allocated memory buffer using numpy/ctypes?
                            
                                How does GCC implement variable-length arrays?
                            
                                Graphics driver "hello world" example? [closed]
                            
                                Format C/C++ code on save
                            
                                How does the gcc `__thread` work?
                            
                                What are the Constraints in Standard C?
                            
                                Python-style integer division & modulus in C
                            
                                How do take a screenshot correctly with xlib?
                            
                                How to resolve fatal error LNK1000: Internal error during IncrBuildImage?
                            
                                Difference between main(void) and main() in C [duplicate]
                            
                                Learning SDL in C [closed]
                            
                                How to do a bit representation in a C-standard way?
                            
                                Is there a bit-equivalent of sizeof() in C?
                            
                                What happens if exception gets thrown "through" c code? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Do function pointers force an instruction pipeline to clear?

Tags:

performance

c

x86

function-pointers

pipeline

abelenky

People also ask

1 Answers

Crashworks

Recent Activity

Donate For Us