Let's suppose a simple if like this:
if (something)
// do_something
else
// do_else
Suppose that this if-else statement is executed in parallel in different threads, and each thread yielding a different result, but constant through its own life. For example, in thread 1 the condition is always evaluated as false, in thread 2, true; in thread 3 always true as well, and so on.
Does branch prediction consider the execution context of each thread to make its statistics? Because if it doesn't (I don't think that, but its difficult to check by testing), the CPU will see the condition follows a random pattern and won't predict at all.
If we ignore SMT (f.ex. hyper-threading) most architectures have a branch predictor per hardware thread. Its tightly coupled with the fetch unit of the individual core. A few (AMD?) store some branch prediction information in L1/L2 I-cache but mostly target for next fetch.
So if you don't run your code on a SMT you are in heaven and will get a 100% predicted every time at the cost of a few instructions.
If you run your code on a SMT you will often find your life is hell, with 50+% mispredict.
Now you can solve your problem easily you just have to use more code, check your condition earlier and call a branch of your code with do_something or do_else in it.
If you have a loop that calls your function where you have your branch you can do something like:
if (something) do_something_loop(); else do_else_loop();
void do_something_loop() { for (auto x : myVec) do_something; }
This has the disadvantage that you need to maintain 2 nearly equal branches of code.
Or you can have your branch in a function call branch_me() which you can make a template function and due to the magic of dead code elimination you should not get any branches in the loops.
C++ Concept code.
template<bool b_something>
void brancher() {
// do things
if (b_something)
// do_something
else
// do_else
}
// do more things
}
void branch_user() {
if (something) {
for (auto x : myVec)
brancher<true>();
} else {
for (auto x : myVec)
brancher<false>();
}
}
Now you only have to maintain the 2 branches of the outer function which hopefully is less work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With