Have been reading Agner Fog's "The microarchitecture of Intel, AMD and VIA CPUs" and on page 34 he describes "return address prediction":
http://www.agner.org/optimize/microarchitecture.pdf
3.15 Returns (all processors except P1)
A better method is used for returns. A Last-In-First-Out buffer, called the return stack buffer,remembers the return address every time a call instruction is executed, and it uses this for predicting where the corresponding return will go. This mechanism makes sure that return instructions are correctly predicted when the same subroutine is called from several different locations.
I am a little unclear what the need for this is, given that the return addresses are stored on the stack anyway?
So what is the purpose of storing return addresses on the stack if there is also this technique? Is the stack-stored value only used if this prediction technique doesnt work?
The Return Stack Buffer (RSB) is a fixed-size buffer that provides predictions for RET instructions. It can underflow in certain conditions. For example, RSB underflow may occur when a program is returning from a deep call stack due to executing more RETs than the number of entries in the RSB for that processor.
The return address is specified in the stack when a program contains a function call or subroutine. When the function is called then, after its complete execution it has to return back to the original program i.e., the next instruction after the function call in the program.
At function return, the stack pointer is instead restored to the frame pointer, the value of the stack pointer just before the function was called. Each stack frame contains a stack pointer to the top of the frame immediately below. The stack pointer is a mutable register shared between all invocations.
Predictors are normally part of the fetch stage, in order to determine which instructions to fetch next. This takes place before the processor has decoded the instructions, and therefore doesn't even know with certainty that a branch instruction exists. Like all predictors, the intent of the return address predictor is to get the direction / target of the branch faster. A return instruction is a branch, and so it would normally have a branch predictor entry to determine whether it is taken and where the target is. The return address predictor is consulted in lieu of the normal branch target buffer.
So perhaps 50 instructions before the return statement is actually "executed", the fetch stage predicts a return instruction and the instruction address to fetch next. Later, when the return is executed, the address is read from the stack and compared with where the return was predicted to go. If they are the same, execution continues, else execution is rolled back to use the proper return address.
Why store on the stack? First, the processor does not know if the predictor has worked without comparing against the address stored on the stack. Second, the stack is the "official" return address, which might be changed for legitimate reasons. Third, the return address predictor has a limited number of entries. The stack is needed for the return instructions for which there was not room to store the addresses in the predictor.
On top of Brians' great explanation, there's the fact that the stack resides in memory. You don't want to have to go to the memory unit and do a memory lookup (not to mention address translation into physical), from some stack address everytime you want to predict the outcome of a branch. The branch prediction wants to be self-sufficient. You can also view the RSB as another form of caching data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With