Branch penalty in pipeline results from non-zero distance between ALU and IF.
What does it mean by this statement?
Without (correct) branch prediction, fetch doesn't know what to fetch next until the ALU decides which way a conditional or indirect branch goes. So it stalls until the branch executes in the ALU.
Or with an incorrect prediction, the fetched/decoded instruction from the wrong path are useless, so we call it the branch mispredict penalty; branch prediction hides it in the normal case.
Another term for this is "branch latency" - the number of cycles from fetching a branch instruction until the front-end fetches a useful next instruction.
Note that even unconditional branches have branch latency: the fact that an instruction is a branch at all isn't known until after it's decoded. This is earlier in the pipeline than execution so the possible penalty is smaller than for conditional or indirect branches.
For example, in first-gen MIPS R2000, a classic 5-stage RISC, conditional branches only take half a cycle in the EX stage, and IF doesn't need the address until the 2nd half of a clock cycle, so the total branch latency is kept down to 1 cycle. MIPS hides that latency with a branch-delay slot: the instruction after a branch always executes, whether the branch it taken or not. (Including unconditional direct branches; the ID stage can produce the target address on its own.) Later more deeply pipelined MIPS CPUs (especially superscalar and/or out-of-order) did need branch prediction, with the delay slot not able to fully hide branch latency.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With