Since I started programming, I have read in every place to avoid wasteful branches at all costs.
That's fine, although none of the articles explained why I should do this. What exactly happens when the CPU decodes a branch instruction and decides to do a jump? And what is the "thing" that makes it slower than other instructions (like addition)?
The Branch Unit is the part of the CPU which allows the program to make decisions, and also to perform jumps (changes to the PC) and procedure calls. The branch unit operates under the control of the Dispatch Unit. Jump/Branch instructions.
So, the branchless version is almost twice as fast as the branching version on my system (3.4 GHz.
Branch instructions can alter the contents of the CPU's Program Counter (or PC) (or Instruction Pointer on Intel microprocessors). The PC maintains the memory address of the next machine instruction to be fetched and executed.
A branch instruction is not inherently slower than any other instruction.
However, the reason you heard that branches should avoided is because modern CPUs follow a pipeline architecture. This means that there are multiple sequential instructions being executed simultaneously. But the pipeline can only be fully utilised if it's able to read the next instruction from memory on every cycle, which in turn means it needs to know which instruction to read.
On a conditional branch, it usually doesn't know ahead of time which path will be taken. So when this happens, the CPU has to stall until the decision has been resolved, and throws away everything in the pipeline that's behind the branch instruction. This lowers utilisation, and therefore performance.
This is the reason that things like branch prediction and branch delay slots exist.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With