Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is it good to avoid instruction branching where possible?

I've often read that it's bad from a perf perspective that branching, kind of at an assembly instruction level, is bad. But I haven't really seen why it's so. So, why?

like image 846
Johann Gerell Avatar asked Apr 14 '11 11:04

Johann Gerell


2 Answers

Most modern processors prefetch instructions and even speculatively execute them before the code flow has reached that instruction. Having a branch means that there are suddenly two different instructions that might be the next instruction. There are at least three possible ways that this can interact with pre-fetching:

  • Instructions after branches aren't pre-fetched. The instruction pipeline becomes empty and the processor must wait as the next instruction is fetched at the last moment, giving worse performance.
  • The processor can guess which branch will be taken (branch prediction) and prefetch and execute the appropriate instruction. If it guesses the wrong branch it will have to discard the work done though, and wait for the correct instruction to be fetched.
  • The processor can fetch and execute both branches then discard the results from the branch that wasn't taken.

Depending on the processor and the specific code, the branch may or may not give significant performance impact compared to equivalent code without a branch. If the processor executing the code uses branch prediction (most do) and mostly guesses correctly for a specific piece of code it may not cause a significant performance impact. On the other hand if it mostly guess incorrectly it might give a huge slow down.

It can be hard to predict for a specific piece of code whether removing the branch will significantly speed up the code. When micro-optimizing it is best to measure the performance of both approaches rather than guess.

like image 191
Mark Byers Avatar answered Sep 21 '22 00:09

Mark Byers


It's bad because it interferes with instruction prefetch. Modern processors can start to load the next command's bytes while still processing the first in order to run faster. When a branch occurs, that "next command" that was prefetched has to be thrown away, which wastes time. Inside of a tight loop or the like, those missed prefetches can add up.

like image 30
Anomie Avatar answered Sep 20 '22 00:09

Anomie