I'm reading around that branch misprediction can be a hot bottleneck for the performance of an application. As I can see, people often show assembly code that unveil the problem and state that programmers usually can predict where a branch could go the most of the times and avoid branch mispredictons. My questions are: <ol> <li> Is it possible to avoid branch mispredictions using some high level programming technique (i.e. no assembly)? </li> <li> What should I keep in mind to produce branch-friendly code in a high level programming language (I'm mostly interested in C and C++)? </li> </ol> Code examples and benchmarks are welcome.

<blockquote> people often ... and state that programmers usually can predict where a branch could go </blockquote> (*) Experienced programmers often remind that human programmers are very bad at predicting that. <blockquote> 1- Is it possible to avoid branch mispredictions using some high level programming technique (i.e. no assembly)? </blockquote> Not in standard c++ or c. At least not for a single branch. What you can do is minimize the depth of your dependency chains so that branch mis-prediction would not have any effect. Modern cpus will execute both code paths of a branch and drop the one that wasn't chosen. There's a limit to this however, which is why branch prediction only matters in deep dependency chains. Some compilers provide extension for suggesting the prediction manually such as __builtin_expect in gcc. Here is a stackoverflow question about it. Even better, some compilers (such as gcc) support profiling the code and automatically detect the optimal predictions. It's smart to use profiling rather than manual work because of (*). <blockquote> 2- What should I keep in mind to produce branch-friendly code in a high level programming language (I'm mostly interested in C and C++)? </blockquote> Primarily, you should keep in mind that branch mis-prediction is only going to affect you in the most performance critical part of your program and not to worry about it until you've measured and found a problem. <blockquote> But what can I do when some profiler (valgrind, VTune, ...) tells that on line n of foo.cpp I got a branch prediction penalty? </blockquote> Lundin gave very sensible advice <ol> <li>Measure fo find out whether it matters.</li> <li>If it matters, then <ul> <li>Minimize the depth of dependency chains of your calculations. How to do that can be quite complicated and beyond my expertise and there's not much you can do without diving into assembly. What you can do in a high level language is to minimize the number of conditional checks (**). Otherwise you're at the mercy of compiler optimization. Avoiding deep dependency chains also allows more efficient use of out-of-order superscalar processors.</li> <li>Make your branches consistently predictable. The effect of that can be seen in this stackoverflow question. In the question, there is a loop over an array. The loop contains a branch. The branch depends on size of the current element. When the data was sorted, the loop could be demonstrated to be much faster when compiled with a particular compiler and run on a particular cpu. Of course, keeping all your data sorted will also cost cpu time, possibly more than the branch mis-predictions do, so, measure.</li> </ul> </li> <li>If it's still a problem, use profile guided optimization (if available).</li> </ol> Order of 2. and 3. may be switched. Optimizing your code by hand is a lot of work. On the other hand, gathering the profiling data can be difficult for some programs as well. (**) One way to do that is transform your loops by for example unrolling them. You can also let the optimizer do it automatically. You must measure though, because unrolling will affect the way you interact with the cache and may well end up being a pessimization.

Branch-aware programming

Tags:

c++

performance

branch-prediction

c

optimization

I'm reading around that branch misprediction can be a hot bottleneck for the performance of an application. As I can see, people often show assembly code that unveil the problem and state that programmers usually can predict where a branch could go the most of the times and avoid branch mispredictons.

My questions are:

Is it possible to avoid branch mispredictions using some high level programming technique (i.e. no assembly)?
What should I keep in mind to produce branch-friendly code in a high level programming language (I'm mostly interested in C and C++)?

Code examples and benchmarks are welcome.

244

asked Sep 15 '15 08:09

Paolo M

1 Answers

people often ... and state that programmers usually can predict where a branch could go

(*) Experienced programmers often remind that human programmers are very bad at predicting that.

1- Is it possible to avoid branch mispredictions using some high level programming technique (i.e. no assembly)?

Not in standard c++ or c. At least not for a single branch. What you can do is minimize the depth of your dependency chains so that branch mis-prediction would not have any effect. Modern cpus will execute both code paths of a branch and drop the one that wasn't chosen. There's a limit to this however, which is why branch prediction only matters in deep dependency chains.

Some compilers provide extension for suggesting the prediction manually such as __builtin_expect in gcc. Here is a stackoverflow question about it. Even better, some compilers (such as gcc) support profiling the code and automatically detect the optimal predictions. It's smart to use profiling rather than manual work because of (*).

2- What should I keep in mind to produce branch-friendly code in a high level programming language (I'm mostly interested in C and C++)?

Primarily, you should keep in mind that branch mis-prediction is only going to affect you in the most performance critical part of your program and not to worry about it until you've measured and found a problem.

But what can I do when some profiler (valgrind, VTune, ...) tells that on line n of foo.cpp I got a branch prediction penalty?

Lundin gave very sensible advice

Measure fo find out whether it matters.
If it matters, then
- Minimize the depth of dependency chains of your calculations. How to do that can be quite complicated and beyond my expertise and there's not much you can do without diving into assembly. What you can do in a high level language is to minimize the number of conditional checks (**). Otherwise you're at the mercy of compiler optimization. Avoiding deep dependency chains also allows more efficient use of out-of-order superscalar processors.
- Make your branches consistently predictable. The effect of that can be seen in this stackoverflow question. In the question, there is a loop over an array. The loop contains a branch. The branch depends on size of the current element. When the data was sorted, the loop could be demonstrated to be much faster when compiled with a particular compiler and run on a particular cpu. Of course, keeping all your data sorted will also cost cpu time, possibly more than the branch mis-predictions do, so, measure.
If it's still a problem, use profile guided optimization (if available).

Order of 2. and 3. may be switched. Optimizing your code by hand is a lot of work. On the other hand, gathering the profiling data can be difficult for some programs as well.

(**) One way to do that is transform your loops by for example unrolling them. You can also let the optimizer do it automatically. You must measure though, because unrolling will affect the way you interact with the cache and may well end up being a pessimization.

answered Sep 30 '22 21:09

eerorika

Related questions
                            
                                What does _T stands for in a CString
                            
                                Is negative index for operator[] well defined?
                            
                                Why is a function without argument identifiers valid in C++?
                            
                                PInvokeStackImbalance C# call to unmanaged C++ function
                            
                                std::cin.getline( ) vs. std::cin
                            
                                JIT compiler vs offline compilers
                            
                                C++: What does #pragma comment(lib, "XXX") actually do with "XXX"?
                            
                                How to do an efficient priority update in STL priority_queue?
                            
                                getting cout output to a std::string
                            
                                What's the difference between long long and long
                            
                                How to set row height of QTableView?
                            
                                How to pass a function pointer that points to constructor?
                            
                                Installing C++ Libraries on OS X
                            
                                use of constexpr in header file
                            
                                strange output in comparison of float with float literal
                            
                                How to reverse the order of arguments of a variadic template function?
                            
                                How do you use CreateThread for functions which are class members?
                            
                                What is lock-free multithreaded programming?
                            
                                How do I add a reference to an unmanaged C++ project called by a C# project?
                            
                                Generating a normal map from a height map?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With