I have a test question here.
Which instructions might potentially slow down processor's work, then pipeline doesn't predict (branch prediction) further way of executing?
Possible answers: JGE | ADD | SUB | PUSH | JMP | JNZ | MUL | JG | CALL
If we are talking about branch prediction, are JGE, JMP, JNZ & JG the way to go?
The instructions like mul
that don't do anything special to EIP of course can't mispredict, but every kind of jump / call / branch can mispredict to some degree in a pipelined design, even a simple call rel32
. The effects can be serious in a heavily pipelined out-of-order execution design like modern x86 CPUs.
Yes, jcc
conditional branches always need prediction; the value of FLAGS isn't available when decoding, only later when executing.
Even direct jmp rel8
/ jmp rel32
(and call rel32
) need prediction early in the front-end, before they're even decoded, so the fetch stage knows which block to fetch next after fetching a block that might or might not include a jump (unconditional or predicted-taken conditional; it doesn't need to know, just whether to keep fetching in a straight line or not). See Slow jmp-instruction for more about simple unconditional direct branches running slower if you have too many for the BTB.
If you consider a simple in-order pipeline like a classic 5-stage RISC, with no buffers between stages, all branches are basically equivalent: the fetch stage needs to fetch 1 instruction per clock to avoid bubbles. It needs to know the next fetch address while the previous instruction is still decoding. Longer pipelines make this problem even worse.
But more simply, there are indirect forms of jmp
and call
like jmp eax
or jmp [edi]
that load a new EIP from a register or memory. Those obviously need prediction; you have unlimited possibilities for where it will go, not just taken or not-taken.
Branches that depend on data (conditional on FLAGS, or indirect on register or memory) can get all the way into the back-end (and execute out-of-order) before a mispredict is discovered. Recovering may require discarding results of executing later instructions from the wrong path, as well as fetching/decoding the correct path. What exactly happens when a skylake CPU mispredicts a branch?
But handling mispredicts of direct jmp/call is simpler: just re-steer the fetch/decode stages because the target address is known after decoding the instruction, without having to execute it. The misprediction doesn't make it into the back-end so it's "just" a bubble in the front-end.
Fun fact: ret
can also mispredict; it's basically an indirect branch (pop eip
). But there are special predictors that take advantage of the usual pairing between call and ret instructions, keeping an internal stack of recent calls that mirrors how the callstack in memory will probably be used. http://blog.stuffedcow.net/2018/04/ras-microbenchmarks/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With