Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Effects of branch prediction on performance?

When I'm writing some tight loop that needs to work fast I am often bothered by thoughts about how the processor branch prediction is going to behave. For instance I try my best to avoid having an if statement in the most inner loop, especially one with a result which is not somewhat uniform (say evaluates to true or false randomly).

I tend to do that because of the somewhat common knowledge that the processor pre-fetches instructions and if it turned out that it mis-predicted a branch then the pre-fetch is useless.

My question is - Is this really an issue with modern processors? How good can branch prediction expected to be?
What coding patterns can be used to make it better?

(For the sake of the discussion, assume that I am beyond the "early-optimization is the root of all evil" phase)

like image 293
shoosh Avatar asked Nov 14 '08 07:11

shoosh


People also ask

How does branch prediction improve performance?

In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch (e.g., an if–then–else structure) will go before this is known definitively. The purpose of the branch predictor is to improve the flow in the instruction pipeline.

Why is branch prediction important?

Branch prediction is very important to the performance of a deeply pipelined processor. Branch prediction enables the processor to begin executing instructions long before the branch outcome is certain. Branch delay is the penalty that is incurred in the absence of a correct prediction.

What happens if branch prediction is wrong?

If it guessed wrong it has to undo anything the speculatively executed instructions might have done, clear the pipeline and begin sending down the instructions it was supposed be executing. This results in the same delay as if it hadn't made a prediction at all.

What is branch prediction and how can it control hazards?

the idea behind branch prediction is simple: if we can correctly predict whether branches are taken or not, we can reduce the stalls due to control hazards.


1 Answers

Branch prediction is pretty darned good these days. But that doesn't mean the penalty of branches can be eliminated.

In typical code, you probably get well over 99% correct predictions, and yet the performance hit can still be significant. There are several factors at play in this.

One is the simple branch latency. On a common PC CPU, that might be in the order of 12 cycles for a mispredict, or 1 cycle for a correctly predicted branch. For the sake of argument, let's assume that all your branches are correctly predicted, then you're home free, right? Not quite.

The simple existence of a branch inhibits a lot of optimizations. The compiler is unable to reorder code efficiently across branches. Within a basic block (that is, a block of code that is executed sequentially, with no branches, one entry point and one exit), it can reorder instructions as it likes, as long as the meaning of the code is preserved, because they'll all be executed sooner or later. Across branches, it gets trickier. We could move these instructions down to execute after this branch, but then how do we guarantee they get executed? Put them in both branches? That's extra code size, that's messy too, and it doesn't scale if we want to reorder across more than one branch.

Branches can still be expensive, even with the best branch prediction. Not just because of mispredicts, but because instruction scheduling becomes so much harder.

This also implies that rather than the number of branches, the important factor is how much code goes in the block between them. A branch on every other line is bad, but if you can get a dozen lines into a block between branches, it's probably possible to get those instructions scheduled reasonably well, so the branch won't restrict the CPU or compiler too much.

But in typical code, branches are essentially free. In typical code, there aren't that many branches clustered closely together in performance-critical code.

like image 154
jalf Avatar answered Oct 19 '22 14:10

jalf