From my university course, I heard, that by convention it is better to place more probable condition in if
rather than in else
, which may help the static branch predictor. For instance:
if (check_collision(player, enemy)) { // very unlikely to be true doA(); } else { doB(); }
may be rewritten as:
if (!check_collision(player, enemy)) { doB(); } else { doA(); }
I found a blog post Branch Patterns, Using GCC, which explains this phenomenon in more detail:
Forward branches are generated for if statements. The rationale for making them not likely to be taken is that the processor can take advantage of the fact that instructions following the branch instruction may already be placed in the instruction buffer inside the Instruction Unit.
next to it, it says (emphasis mine):
When writing an if-else statement, always make the "then" block more likely to be executed than the else block, so the processor can take advantage of instructions already placed in the instruction fetch buffer.
Ultimately, there is article, written by Intel, Branch and Loop Reorganization to Prevent Mispredicts, which summarizes this with two rules:
Static branch prediction is used when there is no data collected by the microprocessor when it encounters a branch, which is typically the first time a branch is encountered. The rules are simple:
- A forward branch defaults to not taken
- A backward branch defaults to taken
In order to effectively write your code to take advantage of these rules, when writing if-else or switch statements, check the most common cases first and work progressively down to the least common.
As I understand, the idea is that pipelined CPU may follow the instructions from the instruction cache without breaking it by jumping to another address within code segment. I am aware, though, that this may be largly oversimplified in case modern CPU microarchitectures.
However, it looks like GCC doesn't respect these rules. Given the code:
extern void foo(); extern void bar(); int some_func(int n) { if (n) { foo(); } else { bar(); } return 0; }
it generates (version 6.3.0 with -O3 -mtune=intel
):
some_func: lea rsp, [rsp-8] xor eax, eax test edi, edi jne .L6 ; here, forward branch if (n) is (conditionally) taken call bar xor eax, eax lea rsp, [rsp+8] ret .L6: call foo xor eax, eax lea rsp, [rsp+8] ret
The only way, that I found to force the desired behavior is by rewriting the if
condition using __builtin_expect
as follows:
if (__builtin_expect(n, 1)) { // force n condition to be treated as true
so the assembly code would become:
some_func: lea rsp, [rsp-8] xor eax, eax test edi, edi je .L2 ; here, backward branch is (conditionally) taken call foo xor eax, eax lea rsp, [rsp+8] ret .L2: call bar xor eax, eax lea rsp, [rsp+8] ret
What is an advantage of static branch prediction? Increases hardware complexity. Increased performance. Low branch prediction accuracy (no better than chance).
One thing you can do in a high-level language is to eliminate branches by expressing the problem in terms of lookups or arithmetic. This helps branch prediction work better on the remaining branches, because there's more "history" available.
The short answer: no, it is not.
GCC does metrics ton of non trivial optimization and one of them is guessing branch probabilities judging by control flow graph.
According to GCC manual:
fno-guess-branch-probability
Do not guess branch probabilities using heuristics.
GCC uses heuristics to guess branch probabilities if they are not provided by profiling feedback (
-fprofile-arcs
). These heuristics are based on the control flow graph. If some branch probabilities are specified by__builtin_expect
, then the heuristics are used to guess branch probabilities for the rest of the control flow graph, taking the__builtin_expec
t info into account. The interactions between the heuristics and__builtin_expect
can be complex, and in some cases, it may be useful to disable the heuristics so that the effects of__builtin_expect
are easier to understand.
-freorder-blocks
may swap branches as well.
Also, as OP mentioned the behavior might be overridden with __builtin_expect
.
Look at the following listing.
void doA() { printf("A\n"); } void doB() { printf("B\n"); } int check_collision(void* a, void* b) { return a == b; } void some_func (void* player, void* enemy) { if (check_collision(player, enemy)) { doA(); } else { doB(); } } int main() { // warming up gcc statistic some_func((void*)0x1, NULL); some_func((void*)0x2, NULL); some_func((void*)0x3, NULL); some_func((void*)0x4, NULL); some_func((void*)0x5, NULL); some_func(NULL, NULL); return 0; }
It is obvious that check_collision
will return 0
most of the times. So, the doB()
branch is likely and GCC can guess this:
gcc -O main.c -o opt.a objdump -d opt.a
The asm of some_func
is:
sub $0x8,%rsp cmp %rsi,%rdi je 6c6 <some_func+0x18> mov $0x0,%eax callq 68f <doB> add $0x8,%rsp retq mov $0x0,%eax callq 67a <doA> jmp 6c1 <some_func+0x13>
But for sure, we can enforce GCC from being too smart:
gcc -fno-guess-branch-probability main.c -o non-opt.a objdump -d non-opt.a
And we will get:
push %rbp mov %rsp,%rbp sub $0x10,%rsp mov %rdi,-0x8(%rbp) mov %rsi,-0x10(%rbp) mov -0x10(%rbp),%rdx mov -0x8(%rbp),%rax mov %rdx,%rsi mov %rax,%rdi callq 6a0 <check_collision> test %eax,%eax je 6ef <some_func+0x33> mov $0x0,%eax callq 67a <doA> jmp 6f9 <some_func+0x3d> mov $0x0,%eax callq 68d <doB> nop leaveq retq
So GCC will leave branches in source order.
I used gcc 7.1.1 for those tests.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With