Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Branch-aware programming

I'm reading around that branch misprediction can be a hot bottleneck for the performance of an application. As I can see, people often show assembly code that unveil the problem and state that programmers usually can predict where a branch could go the most of the times and avoid branch mispredictons.

My questions are:

  1. Is it possible to avoid branch mispredictions using some high level programming technique (i.e. no assembly)?

  2. What should I keep in mind to produce branch-friendly code in a high level programming language (I'm mostly interested in C and C++)?

Code examples and benchmarks are welcome.

like image 244
Paolo M Avatar asked Sep 15 '15 08:09

Paolo M


People also ask

What is branching in programming?

Branching is the practice of creating copies of programs or objects in development to work in parallel versions, retaining the original and working on the branch or making different changes to each.

What is branchless programming?

Branchless programming is a programming technique that eliminates the branches (if, switch, and other conditional statements) from the program. Although this is not much relevant these days with extremely powerful systems and usage of interpreted languages( especially dynamic typed ones).

Why is branching important for a programming language?

Branching statements allow the flow of execution to jump to a different part of the program. The common branching statements used within other control structures include: break , continue , return , and goto .

Is if a branching statement?

When an "Algorithm" makes a choice to do one of two (or more things) this is called branching. The most common programming "statement" used to branch is the "IF" statement.


1 Answers

people often ... and state that programmers usually can predict where a branch could go

(*) Experienced programmers often remind that human programmers are very bad at predicting that.

1- Is it possible to avoid branch mispredictions using some high level programming technique (i.e. no assembly)?

Not in standard c++ or c. At least not for a single branch. What you can do is minimize the depth of your dependency chains so that branch mis-prediction would not have any effect. Modern cpus will execute both code paths of a branch and drop the one that wasn't chosen. There's a limit to this however, which is why branch prediction only matters in deep dependency chains.

Some compilers provide extension for suggesting the prediction manually such as __builtin_expect in gcc. Here is a stackoverflow question about it. Even better, some compilers (such as gcc) support profiling the code and automatically detect the optimal predictions. It's smart to use profiling rather than manual work because of (*).

2- What should I keep in mind to produce branch-friendly code in a high level programming language (I'm mostly interested in C and C++)?

Primarily, you should keep in mind that branch mis-prediction is only going to affect you in the most performance critical part of your program and not to worry about it until you've measured and found a problem.

But what can I do when some profiler (valgrind, VTune, ...) tells that on line n of foo.cpp I got a branch prediction penalty?

Lundin gave very sensible advice

  1. Measure fo find out whether it matters.
  2. If it matters, then
    • Minimize the depth of dependency chains of your calculations. How to do that can be quite complicated and beyond my expertise and there's not much you can do without diving into assembly. What you can do in a high level language is to minimize the number of conditional checks (**). Otherwise you're at the mercy of compiler optimization. Avoiding deep dependency chains also allows more efficient use of out-of-order superscalar processors.
    • Make your branches consistently predictable. The effect of that can be seen in this stackoverflow question. In the question, there is a loop over an array. The loop contains a branch. The branch depends on size of the current element. When the data was sorted, the loop could be demonstrated to be much faster when compiled with a particular compiler and run on a particular cpu. Of course, keeping all your data sorted will also cost cpu time, possibly more than the branch mis-predictions do, so, measure.
  3. If it's still a problem, use profile guided optimization (if available).

Order of 2. and 3. may be switched. Optimizing your code by hand is a lot of work. On the other hand, gathering the profiling data can be difficult for some programs as well.

(**) One way to do that is transform your loops by for example unrolling them. You can also let the optimizer do it automatically. You must measure though, because unrolling will affect the way you interact with the cache and may well end up being a pessimization.

like image 64
eerorika Avatar answered Sep 30 '22 21:09

eerorika