Does GCC generate suboptimal code for static branch prediction?

From my university course, I heard, that by convention it is better to place more probable condition in if rather than in else, which may help the static branch predictor. For instance:

if (check_collision(player, enemy)) { // very unlikely to be true     doA(); } else {     doB(); }

may be rewritten as:

if (!check_collision(player, enemy)) {     doB(); } else {     doA(); }

I found a blog post Branch Patterns, Using GCC, which explains this phenomenon in more detail:

Forward branches are generated for if statements. The rationale for making them not likely to be taken is that the processor can take advantage of the fact that instructions following the branch instruction may already be placed in the instruction buffer inside the Instruction Unit.

next to it, it says (emphasis mine):

When writing an if-else statement, always make the "then" block more likely to be executed than the else block, so the processor can take advantage of instructions already placed in the instruction fetch buffer.

Ultimately, there is article, written by Intel, Branch and Loop Reorganization to Prevent Mispredicts, which summarizes this with two rules:

Static branch prediction is used when there is no data collected by the microprocessor when it encounters a branch, which is typically the first time a branch is encountered. The rules are simple:

A forward branch defaults to not taken

A backward branch defaults to taken

In order to effectively write your code to take advantage of these rules, when writing if-else or switch statements, check the most common cases first and work progressively down to the least common.

As I understand, the idea is that pipelined CPU may follow the instructions from the instruction cache without breaking it by jumping to another address within code segment. I am aware, though, that this may be largly oversimplified in case modern CPU microarchitectures.

However, it looks like GCC doesn't respect these rules. Given the code:

extern void foo(); extern void bar();  int some_func(int n) {     if (n) {         foo();     }     else {         bar();     }     return 0; }

it generates (version 6.3.0 with -O3 -mtune=intel):

some_func:         lea     rsp, [rsp-8]         xor     eax, eax         test    edi, edi         jne     .L6            ; here, forward branch if (n) is (conditionally) taken         call    bar         xor     eax, eax         lea     rsp, [rsp+8]         ret .L6:         call    foo         xor     eax, eax         lea     rsp, [rsp+8]         ret

The only way, that I found to force the desired behavior is by rewriting the if condition using __builtin_expect as follows:

if (__builtin_expect(n, 1)) { // force n condition to be treated as true

so the assembly code would become:

some_func:         lea     rsp, [rsp-8]         xor     eax, eax         test    edi, edi         je      .L2             ; here, backward branch is (conditionally) taken         call    foo         xor     eax, eax         lea     rsp, [rsp+8]         ret .L2:         call    bar         xor     eax, eax         lea     rsp, [rsp+8]         ret

What is an advantage of static branch prediction?

What is an advantage of static branch prediction? Increases hardware complexity. Increased performance. Low branch prediction accuracy (no better than chance).

How can branch prediction be improved?

One thing you can do in a high-level language is to eliminate branches by expressing the problem in terms of lookups or arithmetic. This helps branch prediction work better on the remaining branches, because there's more "history" available.

The short answer: no, it is not.

GCC does metrics ton of non trivial optimization and one of them is guessing branch probabilities judging by control flow graph.

According to GCC manual:

fno-guess-branch-probability

Do not guess branch probabilities using heuristics.

GCC uses heuristics to guess branch probabilities if they are not provided by profiling feedback (-fprofile-arcs). These heuristics are based on the control flow graph. If some branch probabilities are specified by __builtin_expect, then the heuristics are used to guess branch probabilities for the rest of the control flow graph, taking the __builtin_expect info into account. The interactions between the heuristics and __builtin_expect can be complex, and in some cases, it may be useful to disable the heuristics so that the effects of __builtin_expect are easier to understand.

-freorder-blocks may swap branches as well.

Also, as OP mentioned the behavior might be overridden with __builtin_expect.

Proof

Look at the following listing.

void doA() { printf("A\n"); } void doB() { printf("B\n"); } int check_collision(void* a, void* b) { return a == b; }  void some_func (void* player, void* enemy) {     if (check_collision(player, enemy)) {         doA();     } else {         doB();     } }  int main() {     // warming up gcc statistic     some_func((void*)0x1, NULL);     some_func((void*)0x2, NULL);     some_func((void*)0x3, NULL);     some_func((void*)0x4, NULL);     some_func((void*)0x5, NULL);     some_func(NULL, NULL);     return 0; }

It is obvious that check_collision will return 0 most of the times. So, the doB() branch is likely and GCC can guess this:

gcc -O main.c -o opt.a objdump -d opt.a

The asm of some_func is:

sub    $0x8,%rsp cmp    %rsi,%rdi je     6c6 <some_func+0x18> mov    $0x0,%eax callq  68f <doB> add    $0x8,%rsp retq    mov    $0x0,%eax callq  67a <doA> jmp    6c1 <some_func+0x13>

But for sure, we can enforce GCC from being too smart:

gcc -fno-guess-branch-probability main.c -o non-opt.a objdump -d non-opt.a

And we will get:

push   %rbp mov    %rsp,%rbp sub    $0x10,%rsp mov    %rdi,-0x8(%rbp) mov    %rsi,-0x10(%rbp) mov    -0x10(%rbp),%rdx mov    -0x8(%rbp),%rax mov    %rdx,%rsi mov    %rax,%rdi callq  6a0 <check_collision> test   %eax,%eax je     6ef <some_func+0x33> mov    $0x0,%eax callq  67a <doA> jmp    6f9 <some_func+0x3d> mov    $0x0,%eax callq  68d <doB> nop leaveq  retq

So GCC will leave branches in source order.

I used gcc 7.1.1 for those tests.

Does GCC generate suboptimal code for static branch prediction?

Tags:

branch-prediction

c

x86

gcc

assembly

Grzegorz Szpetkowski

People also ask

1 Answers

Proof

ivaigult

Recent Activity

Donate For Us

Does GCC generate suboptimal code for static branch prediction?

Tags:

branch-prediction

c

x86

gcc

assembly

Grzegorz Szpetkowski

People also ask

1 Answers

Proof

ivaigult

Related questions

Recent Activity

Donate For Us