Does the branch predictor kick in with this?

Most, if not all modern processors utilize a technique called "branch prediction", with which it guesses what way to go in an if-then-else branch.

I have a question considering the scheme. Let's say we have this piece of code, in no specific language:

if(someCondition)
{
    // some action
    return someValue;
}
// some other action
return someOtherValue;

Logically speaking, that code is equivalent to this code:

if(someCondition)
{
    // some action
    return someValue;
}
else
{
    // some other action
    return someOtherValue;
}

The branch predictor would 'predict' the branch in the second example, but what about the first example? Will it guess? What will be loaded into the pipeline? Is there any speed to be gained with either of the examples disregarding the impact of actual code in the blocks?

My guess, it is dependent on the compiler: If statements are implemented (in assembly) using jumps which are only undertaken if the compare flag in the register is set. Now what exactly the assembly instructions will look like depends on the compiler. Unless there is a common way of handling it that every compiler does, which I doubt there is, then this is compiler dependent. In that case, what would happen on the latest Visual Studio C++ and GC++ compilers?

As hexafraction pointed out, the relation between the return values as well as how someCondition is determined... the branch predictor might not kick in. Let us consider only true and false as return values. For the condition, let us assume both that it is a field which has been predetermined, either inside or outside the function, a local variable, and some arithmetic statement.

To be honest, I do not suspect there is much difference between the case that the condition is a local variable and the case that the field has been predetermined in the same function.

How does the branch predictor work?

In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch (e.g., an if–then–else structure) will go before this is known definitively. The purpose of the branch predictor is to improve the flow in the instruction pipeline.

Is branch prediction still used?

But the dominant CPU designs today rely on dynamic branch prediction. This technique is able to mostly avoid the frontend bubble problem, by predicting the correct address of the next instruction even for branches that aren't fully decoded and executed yet.

How does Intel branch prediction work?

It predicts where to fetch from next cycle, and then based on that prediction it predicts where to fetch from the cycle after that, and so on. It doesn't use any information about the decoded instructions, but only past behavior of the branches.

What happens if branch prediction is wrong?

If it guessed wrong it has to undo anything the speculatively executed instructions might have done, clear the pipeline and begin sending down the instructions it was supposed be executing. This results in the same delay as if it hadn't made a prediction at all.

Most likely gcc -O3 would optimise that to a branchless sequence, using a conditional-move instructions. e.g. on x86

# generate someValue in %rax, the x86-64 ABI's return value register
# generate someOtherValue in %rdi, to pick one at random
    test someCondition   # probably actually test or cmp a register
    cmovz  %rdi, %rax    # copy %rdi to %rax, if the zero flag is set.
    ret

cmov has a data dependency on both its inputs and on flags. A conditional branch is a control dependency. Using cmov is often good, unless it's part of one long dependency chain and the branch is fairly predictable.

If there was more work inside the if blocks, gcc would generate a conditional jump instruction.

# generate someValue in %rax
    test someCondition
    jz  .zero
    ret
.zero:
    # compute someOtherValue.  This work doesn't need to happen at all
    # if we don't end up needing it, unlike in the cmov case
    mov  someOtherValue, %rax
    ret

Branch prediction operates on conditional-jump instructions, not on high-level constructs. The same instructions are used to jump back to the top of a loop if the loop condition is true. According to http://agner.org/optimize/ , recent Intel CPUs remember patterns of up to 64 iterations for loops. So loops that run the same number of iterations every time don't have a branch mispredict on the last iteration, if the number of iterations is 64 or less.

So it's not the sequence of instructions that the branch predictor looks at to guess whether a jump will be taken or not-taken. Every separate branch instruction gets an entry in the branch-history-buffer when it's taken. And yes, every compiler has no choice but to use jcc (Jump on Condition Code) instructions to implement branches/loops.

The default is predict not-taken. If that prediction is correct, the CPU doesn't evict potentially-still-useful info from the cache to make room. See Agner Fog's microarch doc for more low-level details.

On Linux, to see the branch predictor in action, you can use perf stat:

perf stat /bin/ls  # in some big directory
    ... normal ls output

 Performance counter stats for '/bin/ls':

     10.403069      task-clock (msec)         #    0.094 CPUs utilized
         2,255      context-switches          #    0.217 M/sec
             0      cpu-migrations            #    0.000 K/sec
           190      page-faults               #    0.018 M/sec
    16,612,260      cycles                    #    1.597 GHz
     7,843,399      stalled-cycles-frontend   #   47.21% frontend cycles idle
     5,205,565      stalled-cycles-backend    #   31.34% backend  cycles idle
    20,227,093      instructions              #    1.22  insns per cycle
                                              #    0.39  stalled cycles per insn
     3,975,777      branches                  #  382.173 M/sec
########### These two lines ######
        55,785      branch-misses             #    1.40% of all branches

   0.110765717 seconds time elapsed

Intel Sandybridge (i5 2500k), at low-power clock speed, with the default cpufreq governor, doesn't ramp up in clock speed before ls is done.

There is no difference between those two code samples. The else is irrelevant because there is no need to branch at the end of the true clause. Even were that not true, the branch at the end of the true clause would not be conditional.

In other words, the code must compile to something like:

  Compute test expression
  Branch if false to false_label
  True action
  Return some value
False_label;
  False action
  Return some other value

Does the branch predictor kick in with this?

Tags:

performance

branch-prediction

optimization

compiler-optimization

univise

People also ask

2 Answers

Peter Cordes

rici

Recent Activity

Donate For Us

Does the branch predictor kick in with this?

Tags:

performance

branch-prediction

optimization

compiler-optimization

univise

People also ask

2 Answers

Peter Cordes

rici

Related questions

Recent Activity

Donate For Us