Static branch prediction / GCC optimization

Tags:

Consider the following C program:

void bar();
void baz();

void foo( int a ) {
    if ( a ) {
        bar();
    }
    else {
        baz();
    }
}

On my x86-64-based computer, the instructions generated by GCC with the -O1 optimization level gives:

 0: sub    $0x8,%rsp
 4: test   %edi,%edi
 6: je     14 <foo+0x14>
 8: mov    $0x0,%eax
 d: callq  12 <foo+0x12> # relocation to bar
12: jmp    1e <foo+0x1e>
14: mov    $0x0,%eax
19: callq  1e <foo+0x1e> # relocation to baz
1e: add    $0x8,%rsp
22: retq

whereas adding the -freorder-blocks optimization parameter (included in -O2) turns the code into:

 0: sub    $0x8,%rsp
 4: test   %edi,%edi
 6: jne    17 <foo+0x17>
 8: mov    $0x0,%eax
 d: callq  12 <foo+0x12> # relocation to baz
12: add    $0x8,%rsp
16: retq   
17: mov    $0x0,%eax
1c: callq  21 <foo+0x21> # relocation to bar
21: add    $0x8,%rsp
25: retq

what is mainly a change from jump equals to jump not equals. I know that up to Pentium 4, static branch prediction on a conditional forward branch was considered not taken by the processor (it seems that static prediction became random on further Intel processors), thus I imagine this optimization is dealing with this.

Assuming that and refering to the jne optimized version, it would mean that the else block is in fact considered to be more likely executed than the if block in the program flow.

But what does it mean exactly? Since there is no assumption on the a value in the foo function by the compiler, such probability relies on the programmer's writings only (who could in fact have used if ( !a ) instead of if ( a ) and inverted function calls).

Does that mean that it should be considered as a good practice to treat if conditional blocks as exceptional cases (and not the normal execution flow)?

That is:

if ( !cond ) {
    // exceptional code
}
else {
    // normal continuation
}

instead of:

if ( cond ) {
    // normal continuation
}
else {
    // exceptional code
}

(of course, one could prefer using return statement inside relevant block to limit indentation size).

699

asked Sep 01 '13 16:09

lledr

1 Answers

I once had significant amount of performance optimization actions on ARM(7,9). It was plain C, dumb enough compiler (SDT AFAIR). One of the way to save some CPU resources was to analyse if branches and rewrite if condition so normal flow doesn't break linear instructions sequence. This had positive effect both because of CPU prediction block more efficient usage and more efficient code segment memory cache usage.

I think here we see optimization which is very close. In first code fragment both branches lead to normal sequence being broken (line with lavel 6 for one branch and 12 for another). In second fragment one branch instructions are ordered up to retq and other branch sequence has single jump (not worse than it was in first fragment). Please pay attention to 2 retq instructions.

So as I can see this is not the question of je or jne but rather question of blocks reordering so branches are linear instructions sequence with one of them entered without any jump and full prediction block power saved.

Regarding to "why GCC prefers one branch over another"... I see in documentation this can be result of static branch prediction (based on calls inside translation unit?). Anyway I'd recommend to play with __builtin_expect to have more detailed answer.

131

answered Oct 12 '22 11:10

Roman Nikitchenko

Related questions
                            
                                arm cortex a9 cross compiling strange floating point behaviour
                            
                                How is object code copied into executable when linking against static library?
                            
                                What is a MsgPack 'zone'
                            
                                Read/Write files in C
                            
                                Transmitting UDP packages with source address 0.0.0.0
                            
                                What does the preprocessor do with "# <number> <filename>"?
                            
                                for_each_process - Does it iterate over the threads and the processes as well?
                            
                                C object file compatibility between computers
                            
                                Application crashes, but no core dump generated
                            
                                C encoding of character constants
                            
                                Impacts of CPU cache on speed
                            
                                Casting from void* to struct
                            
                                Algorithm to match one input file with given numbers of file
                            
                                Convert unsigned long long to double in C
                            
                                mach_vm_region vs mach_vm_region_recurse
                            
                                How not to use global variables in c
                            
                                Can I fopen with shared read write?
                            
                                Is it possible to change the exit code in a function registered with atexit()?
                            
                                Linux draw on screen independent of windows manager
                            
                                swig numpy multiple matrix and array inputs

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Static branch prediction / GCC optimization

Tags:

c

optimization

gcc

assembly

x86-64

lledr

People also ask

1 Answers

Roman Nikitchenko

Recent Activity

Donate For Us