How "sticky" is the branch predictor logic? If code is being removed from the instruction caches, do the statistics stay with it?
Put another way, if the code is complex or not processing things in batch, is branch prediction still going to help?
Let's assume commodity Intel server hardware newer than 2011.
The exact workings of branch predictors will vary between processors. But nearly all non-trivial branch predictors need a history of the branches in the program to function.
This history is recorded in the branch history buffer.
These come in multiple flavors. The two most commonly studied are:
Modern processors will have multiple buffers for different purposes. In all cases, the buffers have a limited size. So when they run out of room, something will need to be evicted.
Neither Intel nor AMD gives details about their branch predictors. But it is believed that current processors from both companies can track thousands of branches along with their histories.
Getting back to the point, the data that is used by the branch predictors will "stick" for as long as it stays in the history buffers. So the performance of the predictors is best if the code is small and well-behaved enough to not overrun the buffers.
Note that the instruction and uop caches, while independent of the branch predictor, will exhibit the same effects. So it may be difficult to single out the branch predictor when attempting to construct test cases and benchmarks to study its behavior.
So this is yet another case in performance where having locality has advantages.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With