Which instructions can produce a branch misprediction on x86 CPUs?

Question

I have a test question here.

Which instructions might potentially slow down processor's work, then pipeline doesn't predict (branch prediction) further way of executing?

Possible answers: JGE | ADD | SUB | PUSH | JMP | JNZ | MUL | JG | CALL

If we are talking about branch prediction, are JGE, JMP, JNZ & JG the way to go?

Peter Cordes · Accepted Answer

The instructions like mul that don't do anything special to EIP of course can't mispredict, but every kind of jump / call / branch can mispredict to some degree in a pipelined design, even a simple call rel32. The effects can be serious in a heavily pipelined out-of-order execution design like modern x86 CPUs.

Yes, jcc conditional branches always need prediction; the value of FLAGS isn't available when decoding, only later when executing.

Even direct jmp rel8 / jmp rel32 (and call rel32) need prediction early in the front-end, before they're even decoded, so the fetch stage knows which block to fetch next after fetching a block that might or might not include a jump (unconditional or predicted-taken conditional; it doesn't need to know, just whether to keep fetching in a straight line or not). See Slow jmp-instruction for more about simple unconditional direct branches running slower if you have too many for the BTB.

If you consider a simple in-order pipeline like a classic 5-stage RISC, with no buffers between stages, all branches are basically equivalent: the fetch stage needs to fetch 1 instruction per clock to avoid bubbles. It needs to know the next fetch address while the previous instruction is still decoding. Longer pipelines make this problem even worse.

But more simply, there are indirect forms of jmp and call like jmp eax or jmp [edi] that load a new EIP from a register or memory. Those obviously need prediction; you have unlimited possibilities for where it will go, not just taken or not-taken.

Branches that depend on data (conditional on FLAGS, or indirect on register or memory) can get all the way into the back-end (and execute out-of-order) before a mispredict is discovered. Recovering may require discarding results of executing later instructions from the wrong path, as well as fetching/decoding the correct path. What exactly happens when a skylake CPU mispredicts a branch?

But handling mispredicts of direct jmp/call is simpler: just re-steer the fetch/decode stages because the target address is known after decoding the instruction, without having to execute it. The misprediction doesn't make it into the back-end so it's "just" a bubble in the front-end.

Fun fact: ret can also mispredict; it's basically an indirect branch (pop eip). But there are special predictors that take advantage of the usual pairing between call and ret instructions, keeping an internal stack of recent calls that mirrors how the callstack in memory will probably be used. http://blog.stuffedcow.net/2018/04/ras-microbenchmarks/

Which instructions can produce a branch misprediction on x86 CPUs?

Tags:

branch-prediction

x86

assembly

cpu

pipeline

Rimvydas Kanapka

1 Answers

Peter Cordes

Recent Activity

Donate For Us

Which instructions can produce a branch misprediction on x86 CPUs?

Tags:

branch-prediction

x86

assembly

cpu

pipeline

Rimvydas Kanapka

1 Answers

Peter Cordes

Related questions

Recent Activity

Donate For Us