Just to make it clear, I'm not going for any sort of portability here, so any solutions that will tie me to a certain box is fine.
Basically, I have an if statement that will 99% of the time evaluate to true, and am trying to eke out every last clock of performance, can I issue some sort of compiler command (using GCC 4.1.2 and the x86 ISA, if it matters) to tell the branch predictor that it should cache for that branch?
On the SPEC'89 benchmarks, very large bimodal predictors saturate at 93.5% correct, once every branch maps to a unique counter. The predictor table is indexed with the instruction address bits, so that the processor can fetch a prediction for every instruction before the instruction is decoded.
This says whether the branch was recently taken or not. Based on this, the processor fetches the next instruction from the target address / sequential address. If the prediction is wrong, flush the pipeline and also flip prediction. So, every time a wrong prediction is made, the prediction bit is flipped.
The idea is very simple, but the effect is surprisingly good. From the data of Wikipedia, we can know that the branch prediction accuracy of modern CPU can reach more than 90%.
Yes, but it will have no effect. Exceptions are older (obsolete) architectures pre Netburst, and even then it doesn't do anything measurable.
There is an "branch hint" opcode Intel introduced with the Netburst architecture, and a default static branch prediction for cold jumps (backward predicted taken, forward predicted non taken) on some older architectures. GCC implements this with the __builtin_expect (x, prediction)
, where prediction is typically 0 or 1. The opcode emitted by the compiler is ignored on all newer processor architecures (>= Core 2). The small corner case where this actually does something is the case of a cold jump on the old Netburst architecture. Intel recommends now not to use the static branch hints, probably because they consider the increase of the code size more harmful than the possible marginal speed up.
Besides the useless branch hint for the predictor, __builtin_expect
has its use, the compiler may reorder the code to improve cache usage or save memory.
There are multiple reasons it doesn't work as expected.
Read more about the inner works of the branch prediction at Agner Fogs manuals. See also the gcc mailing list.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With