While reading and learning from open source OSes I stumbled across an extremely complicated way of calling a "method" in assembly. It uses the 'ret' instruction to call a library method doing this:
push rbp ; rsp[1] = rbp
mov rbp, .continue ; save return label to rbp
xchg rbp, QWORD [rsp] ; restore rbp and set rsp[1] to return label
push rbp ; rsp[0] = rbp
mov rbp, 0x0000700000000000 + LIB_PTR_TABLE.funcOffset ; rbp = pointer to func pointer
mov rbp, QWORD [rbp] ; rbp = func pointer
xchg rbp, QWORD [rsp] ; restore rbp and set rsp[0] to func pointer
; "call" library by "returning" to the address we just planted
ret
.continue:
I added the comments in order to understand it myself and it seems I am right or close enough because all experiments I did succeeded. But then i tried doing this, which also works perfectly:
mov rax, 0x0000700000000000 + LIB_PTR_TABLE.funcOffset ; rax = ptr to func ptr
mov rax, QWORD [rax] ; rax = func ptr
call rax ; actually call the library function in a normal fashion
Looking at the amount of instructions and what the CPU actually has to do in both cases one would assume, if one was faster it would be the "call" variant. But since the "ret" variant was used and coming up with this requires a bunch of knowledge in the first place, what advantages does the first variant have? (Or does it?)
The ret instruction transfers control to the return address located on the stack. This address is usually placed on the stack by a call instruction. Issue the ret instruction within the called procedure to resume execution flow at the instruction following the call .
In assembly language, the call instruction handles passing the return address for you, and ret handles using that address to return back to where you called the function from. The return value is the main method of transferring data back to the main program.
As CPUs get faster the chance of a CPU stalling (and being unable to do anything) due to things like cache misses and branch mispredictions increase. To help avoid these stalls most modern 80x86 CPUs have a bunch of logic to help predict the target address of control flow changes; including branch direction predictors, branch target predictors, return stack buffers, etc.
The problem is that a malicious attacker (using speculative execution and measuring timing) can extract confidential information from all the information that the CPU collects to improve performance; including extracting confidential information from branch direction predictors, branch target predictors, return stack buffers, etc.
When this was discovered, people (mostly kernel developers) scrambled to think of various ways to mitigate the security problem. Specifically, looking for ways to avoid, spoil or pollute the data the CPU collects.
More specifically (for the code you've shown); if the code used call rax
, then it'd add data to the CPU's return stack buffer that a malicious attacker could probe to determine something about the original value in rax
(and if rax
is supposed to be confidential, then this constitutes a confidentiality leak).
One alternative is to push a return address and then use an indirect jump. In this case it would just leave (confidential) data in the CPU's branch target buffer that could be probed by an attacker, which doesn't really help.
Using ret
instead prevents the security problem by not storing anything on the return stack buffer (or in the branch target buffer). As a side-effect, it will also "de-sync" the CPU's return stack buffer; obfuscating previous calls/future returns a little.
Sadly; all of this causes a performance problem - it brings us back to "as CPUs get faster the chance of a CPU stalling increases" and adds the cost of fetching code from the wrong address on top of the cost of the stall.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With