Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

x86 assembly: Pop a value without storing it

In x86 assembly, is it possible to remove a value from the stack without storing it? Something along the lines of pop word null? I could obviously use add esp,4, but maybe there's a nice and clean cisc mnemonic i'm missing?

like image 224
NeoTheThird Avatar asked Feb 09 '18 12:02

NeoTheThird


People also ask

What does Popl do in assembly?

The pop instruction removes the 4-byte data element from the top of the hardware-supported stack into the specified operand (i.e. register or memory location).

What is pop eax?

("push eax" gives an error "instruction not supported in 64-bit mode"; use "push rax" instead.) "pop" retrieves the last value pushed from the stack. Everything you push, you MUST pop again at some point afterwards, or your code will crash almost immediately!

What is push RBP in assembly?

push rbp instruction pushes the value of the register rbp onto the stack. Because it “pushes” onto the stack, now the value of rsp is the memory address of the new top of the stack.


1 Answers

add esp,4 / add rsp,8 is the normal / idiomatic / clean way. No special way is needed because stacks aren't magical or special (at least not in this respect); it's just a pointer in a register with some instructions that use it implicitly. (And for kernel stacks, interrupts use it asynchronously so software couldn't implement a kernel red-zone even if it wanted to...)

Other than that, the magical CISC way to clean up a whole stack frame at the end of a function is leave = mov esp, ebp / pop ebp (or the 16 or 64-bit equivalent). Unlike enter, it's fast enough on modern CPUs to be usable in practice, but still a 3 uop instruction on Intel CPUs. (http://agner.org/optimize/). But leave only works in the first place if you spent extra instructions making a stack frame with ebp / rbp in the first place. (Usually you wouldn't do that, unless you need to reserve a variable amount of stack space, e.g. with push in a loop to make an array, or the equivalent of a C99 VLA or alloca. Or for beginner code to make access to locals easier, or in 16-bit mode where SP can't be used in addressing modes.)

The magical CISC way to clean up stack-args is for the callee to use ret imm16 (costing 1 extra uop) to pop the args, creating a calling convention where the callee cleans the stack. In a caller-pops calling convention, there's no way to use this form of ret, but you can simply leave the stack offset and use mov to store args for the next function call instead of push (if the function needs any stack-args at all; register-arg calling conventions are generally more efficient.)

So the magic CISC ways have no performance advantage on modern CPUs, only minor code-size.


There are 2 reasons you might use pop reg instead of add esp,4:

  • code-size: pop r32/r64 is a one-byte instruction, vs. 3 bytes for add esp,4 or 4 bytes for add rsp,8.
  • performance: Intel's stack engine has to insert extra stack-sync uops when you use esp / rsp explicitly after a stack instruction (push/pop/call/ret). So after a call (which returns with a ret), it saves a uop to use pop instead of add esp,4 before you ret at the end of the function.

    AMD's stack engine doesn't need extra stack-sync uops, but still makes push/pop single-uop instructions. Unlike on older Intel/AMD CPUs, where push/pop cost more than plain mov loads/stores, needing a separate uop for the stack-pointer modification. And creating a data dependency on the stack pointer.

See Why does this function push RAX to the stack as the first operation? for more details about performance.

If you were looking for aesthetics, well you can indent, format, and comment your code nicely, but beyond you chose the wrong language when you picked x86 asm if aesthetics outweigh optimization.


Of course, if you need to adjust the stack by more than 1 register-width, definitely use add if you don't need the data that pop would load. Or, if you need to adjust it by +128 bytes, use sub esp, -128, because -128 is encodable as a sign-extended imm8, but +128 isn't.

Or maybe use lea esp, [esp+4], like gcc does with -mtune=atom. (For in-order atom, not silvermont). Like I said, if you wanted clean, you shouldn't have picked x86 asm.


You can almost always find a dead register to pop into. If you need to adjust E/RSP by one stack slot before popping some registers you actually wanted to pop, you can always pop the same register twice.

In the extremely rare case where none of the 7 (x86-32) or 15 (x86-64) non-stack register are available as pop destinations, this optimization is not available and you should simply use the traditional add. It's not worth spending extra instructions to make it possible to pop; that would outweigh the minor benefit of using pop.

Note that pop Sreg (segment register) still consumes the regular "stack width" (32 or 64 bits, depending on mode), rather than only 16 for a 16-bit register. But only pop ds/es/ss are single-byte. pop fs/gs are 2 bytes each. So if you're optimizing for code-size, pop gs is 1 byte smaller than add esp,4, but much much slower. (Or 2 bytes smaller than add rsp,8).

like image 175
Peter Cordes Avatar answered Nov 22 '22 04:11

Peter Cordes