I've often read that it's bad from a perf perspective that branching, kind of at an assembly instruction level, is bad. But I haven't really seen why it's so. So, why?
Most modern processors prefetch instructions and even speculatively execute them before the code flow has reached that instruction. Having a branch means that there are suddenly two different instructions that might be the next instruction. There are at least three possible ways that this can interact with pre-fetching:
Depending on the processor and the specific code, the branch may or may not give significant performance impact compared to equivalent code without a branch. If the processor executing the code uses branch prediction (most do) and mostly guesses correctly for a specific piece of code it may not cause a significant performance impact. On the other hand if it mostly guess incorrectly it might give a huge slow down.
It can be hard to predict for a specific piece of code whether removing the branch will significantly speed up the code. When micro-optimizing it is best to measure the performance of both approaches rather than guess.
It's bad because it interferes with instruction prefetch. Modern processors can start to load the next command's bytes while still processing the first in order to run faster. When a branch occurs, that "next command" that was prefetched has to be thrown away, which wastes time. Inside of a tight loop or the like, those missed prefetches can add up.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With