Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Static branch prediction / GCC optimization

Consider the following C program:

void bar();
void baz();

void foo( int a ) {
    if ( a ) {
        bar();
    }
    else {
        baz();
    }
}

On my x86-64-based computer, the instructions generated by GCC with the -O1 optimization level gives:

 0: sub    $0x8,%rsp
 4: test   %edi,%edi
 6: je     14 <foo+0x14>
 8: mov    $0x0,%eax
 d: callq  12 <foo+0x12> # relocation to bar
12: jmp    1e <foo+0x1e>
14: mov    $0x0,%eax
19: callq  1e <foo+0x1e> # relocation to baz
1e: add    $0x8,%rsp
22: retq

whereas adding the -freorder-blocks optimization parameter (included in -O2) turns the code into:

 0: sub    $0x8,%rsp
 4: test   %edi,%edi
 6: jne    17 <foo+0x17>
 8: mov    $0x0,%eax
 d: callq  12 <foo+0x12> # relocation to baz
12: add    $0x8,%rsp
16: retq   
17: mov    $0x0,%eax
1c: callq  21 <foo+0x21> # relocation to bar
21: add    $0x8,%rsp
25: retq

what is mainly a change from jump equals to jump not equals. I know that up to Pentium 4, static branch prediction on a conditional forward branch was considered not taken by the processor (it seems that static prediction became random on further Intel processors), thus I imagine this optimization is dealing with this.

Assuming that and refering to the jne optimized version, it would mean that the else block is in fact considered to be more likely executed than the if block in the program flow.

But what does it mean exactly? Since there is no assumption on the a value in the foo function by the compiler, such probability relies on the programmer's writings only (who could in fact have used if ( !a ) instead of if ( a ) and inverted function calls).

Does that mean that it should be considered as a good practice to treat if conditional blocks as exceptional cases (and not the normal execution flow)?

That is:

if ( !cond ) {
    // exceptional code
}
else {
    // normal continuation
}

instead of:

if ( cond ) {
    // normal continuation
}
else {
    // exceptional code
}

(of course, one could prefer using return statement inside relevant block to limit indentation size).

like image 699
lledr Avatar asked Sep 01 '13 16:09

lledr


People also ask

How can branch prediction be improved?

One thing you can do in a high-level language is to eliminate branches by expressing the problem in terms of lookups or arithmetic. This helps branch prediction work better on the remaining branches, because there's more "history" available. I've made huge performance improvements to bottleneck code with this approach.

What is static branch prediction?

Static branch prediction Static prediction is the simplest branch prediction technique because it does not rely on information about the dynamic history of code executing. Instead, it predicts the outcome of a branch based solely on the branch instruction.

What are the two types of branch prediction techniques available?

Branch prediction technique can be of two types: Static Branch Prediction Technique. Dynamic Branch Prediction Technique.

How do you specify compiler optimizations in GCC?

GCC has a range of optimization levels, plus individual options to enable or disable particular optimizations. The overall compiler optimization level is controlled by the command line option -On, where n is the required optimization level, as follows: -O0 . (default).


1 Answers

I once had significant amount of performance optimization actions on ARM(7,9). It was plain C, dumb enough compiler (SDT AFAIR). One of the way to save some CPU resources was to analyse if branches and rewrite if condition so normal flow doesn't break linear instructions sequence. This had positive effect both because of CPU prediction block more efficient usage and more efficient code segment memory cache usage.

I think here we see optimization which is very close. In first code fragment both branches lead to normal sequence being broken (line with lavel 6 for one branch and 12 for another). In second fragment one branch instructions are ordered up to retq and other branch sequence has single jump (not worse than it was in first fragment). Please pay attention to 2 retq instructions.

So as I can see this is not the question of je or jne but rather question of blocks reordering so branches are linear instructions sequence with one of them entered without any jump and full prediction block power saved.

Regarding to "why GCC prefers one branch over another"... I see in documentation this can be result of static branch prediction (based on calls inside translation unit?). Anyway I'd recommend to play with __builtin_expect to have more detailed answer.

like image 131
Roman Nikitchenko Avatar answered Oct 12 '22 11:10

Roman Nikitchenko