I was testing some code on Visual Studio 2008 and noticed security_cookie
. I can understand the point of it, but I don't understand what the purpose of this instruction is.
rep ret /* REP to avoid AMD branch prediction penalty */
Of course I can understand the comment :) but what is this prefix exaclty doing in context with the ret
and what happens if ecx
is != 0? Apparently the loop count from ecx
is ignored when I debug it, which is to be expected.
The code where I found this was here (injected by the compiler for security):
void __declspec(naked) __fastcall __security_check_cookie(UINT_PTR cookie) { /* x86 version written in asm to preserve all regs */ __asm { cmp ecx, __security_cookie jne failure rep ret /* REP to avoid AMD branch prediction penalty */ failure: jmp __report_gsfailure } }
Description. Use the rep (repeat while equal), repnz (repeat while nonzero) or repz (repeat while zero) prefixes in conjunction with string operations. Each prefix causes the associated string instruction to repeat until the count register (CX) or the zero flag (ZF) matches a tested condition.
Description. The ret instruction transfers control to the return address located on the stack. This address is usually placed on the stack by a call instruction. Issue the ret instruction within the called procedure to resume execution flow at the instruction following the call .
For MOVZBL, the low 8 bits of the destination are replaced by the source operand. the top 24 bits are set to 0. The source operand is unaffected. For MOVZBW, the low 16 bits of the destination are replaced by the source operand.
lea — Load effective address. The lea instruction places the address specified by its first operand into the register specified by its second operand. Note, the contents of the memory location are not loaded, only the effective address is computed and placed into the register.
There's a whole blog named after this instruction. And the first post describes the reason behind it: http://repzret.org/p/repzret/
Basically, there was an issue in the AMD's branch predictor when a single-byte ret
immediately followed a conditional jump as in the code you quoted (and a few other situations), and the workaround was to add the rep
prefix, which is ignored by CPU but fixes the predictor penalty.
Apparently, some AMD processors' branch predictors behave badly when a branch's target or fallthrough is a ret
instruction, and adding the rep
prefix avoids this.
As to the meaning of rep ret
, there is no mention of this instruction sequence in the Intel Instruction Set Reference, and the documentation of rep
is not being very helpful:
The behavior of the REP prefix is undefined when used with non-string instructions.
This means at least that the rep
doesn't have to behave in a repeating manner.
Now, from the AMD instruction set reference (1.2.6 Repeat Prefixes):
The prefixes should only be used with such string instructions.
In general, the repeat prefixes should only be used in the string instructions listed in tables 1-6, 1-7, and 1-8 above [which do not contain ret].
So it really seems like undefined behavior but one can assume that, in practice, processors just ignore rep
prefixes on ret
instructions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With