What happens with nested branches and speculative execution?

Tags:

Alright, so I know that if a particular conditional branch has a condition that takes time to compute (memory access, for instance), the CPU assumes a condition result and speculatively executes along that path. However, what would happen if, along that path, yet another slow conditional branch pops up (assuming, of course, that the first condition hasn't been resolved yet and the CPU can't just commit the changes)? Does the CPU just speculate inside the speculation? What happens if the last condition is mispredicted but the first wasn't? Does it just rollback all the way?

I'm talking about something like this:

if (value_in_memory == y){
   // computations
   if (another_val_memory == x){
      //computations
   }
}

384

asked Dec 06 '19 08:12

C. Pinto

1 Answers

Speculative execution is the regular state of execution, not a special mode that an out of order CPU enters when it sees a branch and then leaves when the branch is no longer in flight.

This is easier to see if you consider that it's not just branches that can fault, but many instructions, including those that access memory, have restrictions on their input values, etc. So any substantial out of order execution implies constant speculation, and CPUs are built around that idea.

So "nested branches" doesn't end up being special in that sense.

Now, modern CPUs have a variety of methods for quick branch misprediction recovery, faster than recovery from other types of faults¹. For example they may snapshot the state of the register mapping at some branches, to allow recovery to start before the branch is at the head of the reorder buffer. Since it is not always feasible to snapshot at all branches, there might be complicated heuristics involved to decide where to take snapshots.

I mention this last part because it is one way in which nested branches might matter: when there are lots of branches in flight, you might hit some microarchitectural limits related to the tracking of these branches for recovery purposes. For more details, you can look through patents for "branch order buffer" (for Intel techniques, but there are no doubt others).

¹ The basic recovery method is keep executing until the faulting instruction is the next to retire, and then throw away all younger instructions. In the context of branch mispredictions, this means you could actually suffer two or more mispredictions only the oldest of which actually takes effect: e.g., a younger branch mispredicts, and while executing up to that branch (at which point recovery can occur), another mispredict occurs, so the younger one ends up getting discarded.

answered Sep 22 '22 22:09

BeeOnRope

Related questions
                            
                                Why does the 80x87 instruction set use a "stack-based" design?
                            
                                When accessing memory, will the page table accessed/dirty bit be set under a cache hit situation?
                            
                                Conflict Miss v/s Compulsory Miss
                            
                                CPU cache: does the distance between two address needs to be smaller than 8 bytes to have cache advantage?
                            
                                Is processor can do memory and arithmetic operation at the same time?
                            
                                Does memory fencing blocks threads in multi-core CPUs?
                            
                                Is it possible for the RESOURCE_STALLS.RS event to occur even when the RS is not completely full?
                            
                                RISCV: how the branch intstructions are calculated?
                            
                                Is there any way to write for Intel CPU direct core-to-core communication code?
                            
                                Why doesn't RFO after retirement break memory ordering?
                            
                                Cortex M4 LDR/STR timing
                            
                                How to find number of conflict misses in a cache simulator
                            
                                Inclusive or exclusive ? L1, L2 cache in Intel Core IvyBridge processor
                            
                                change instruction set in GCC
                            
                                Why do 32-bit applications work on 64-bit x86 CPUs?
                            
                                Can atomic instructions straddle cache lines?
                            
                                Is the assembly language different from one architecture to another?
                            
                                Understanding FMA instructions performance
                            
                                Is it allowed to access memory that spans the zero boundary in x86?
                            
                                How does the CPU know how many bytes it should read for the next instruction, considering instructions have different lenghts?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What happens with nested branches and speculative execution?

Tags:

cpu-architecture

branch-prediction

speculative-execution

nested-if

C. Pinto

People also ask

1 Answers

BeeOnRope

Recent Activity

Donate For Us