What is an assembly-level representation of pushl/popl %esp?

Tags:

I'm trying to understand the behavior of pushing and popping the stack pointer register. In AT&T:

pushl %esp

and

popl %esp

Note that they store the computed value back into %esp.

I'm considering these instructions independently, not in sequence. I know that the value stored in %esp is always the value before the increment/decrement, but how could I represent the behavior in assembly language? This is what I've come up with so far.

For pushl %esp (ignoring FLAGS and the effect on the temporary register):

movl %esp, %edx     1. save value of %esp
subl  $4, %esp      2. decrement stack pointer
movl %edx, (%esp)   3. store old value of %esp on top of stack

For popl %esp:

movl (%esp), %esp   You wouldn’t need the increment portion.

Is this correct? If not, where am I going wrong?

528

asked Feb 19 '13 22:02

amorimluc

2 Answers

As it says about push esp in Intel® 64 and IA-32 Architectures Developer's Manual: Combined Volumes (actually in vol.2, or HTML scrape at https://www.felixcloutier.com/x86/push):

The PUSH ESP instruction pushes the value of the ESP register as it existed before the instruction was executed. If a PUSH instruction uses a memory operand in which the ESP register is used for computing the operand address, the address of the operand is computed before the ESP register is decremented.

And as regards to pop esp (https://www.felixcloutier.com/x86/pop):

The POP ESP instruction increments the stack pointer (ESP) before data at the old top of stack is written into the destination.

and pop 16(%esp)

If the ESP register is used as a base register for addressing a destination operand in memory, the POP instruction computes the effective address of the operand after it increments the ESP register.

So yes, your pseudo-code is correct except for modifying FLAGS and %edx.

161

answered Oct 20 '22 08:10

nrz

Yes, those sequences are correct except for the effect on FLAGS, and of course push %esp doesn't clobber %edx. Instead, imagine an internal temporary¹ if you want to break it down into separate steps, instead of thinking of a push primitive operation which snapshots its input (source operand) before doing anything else.

(Similarly pop DST can be modeled as pop %temp / mov %temp, DST, with all effects of pop finished before it evaluates and writes to the destination, even if that is or involves the stack pointer.)

`push` equivalents that work even in the ESP special cases

(In all of these, I'm assuming 32-bit compat or protected mode with SS configured normally, with stack address size matching the mode, if it's even possible for that not to be the case. The 64-bit mode equivalent with %rsp works the same way with -8 / +8. 16-bit mode doesn't allow (%sp) addressing modes so you'd have to consider this as pseudo-code.)

#push SRC         for any source operand including %esp or 1234(%esp)
   mov  SRC, %temp
   lea  -4(%esp), %esp         # esp-=4 without touching FLAGS
   mov  %temp, (%esp)

i.e. mov SRC, %temp ; push %temp
Or since we're describing an uninterruptible transaction anyway (a single push instruction),
we don't need to move ESP before storing:

#push %REG              # or immediate, but not memory source
   mov  %REG, -4(%esp)
   lea  -4(%esp), %esp

(This simpler version wouldn't assemble for real with a memory source, only register or immediate, as well as being unsafe if an interrupt or signal handler runs between the mov and the LEA. In real assembly, mov mem, mem with two explicit addressing modes isn't encodeable, but push (%eax) because the memory destination is implicit. You could consider it as pseudo-code even for a memory source. But snapshotting in a temporary is a more realistic model of what happens internally, like the first block or mov SRC, %temp / push %temp.)

If you're talking about actually using such a sequence in a real program, I don't think there's a way to exactly duplicate push %esp without a temporary register (first version), or (second version) disabling interrupts or having an ABI with a red-zone. (Like x86-64 System V for non-kernel code, so you could duplicate push %rsp.)

`pop` equivalents:

#pop DST   works for any operand
  mov  (%esp), %temp
  lea  4(%esp), %esp      # esp += 4 without touching FLAGS
  mov  %temp, DST         # even if DST is %esp or 1234(%esp)

i.e. pop %temp / mov %temp, DST. That accurately reflects the case where DST is a memory addressing mode that involves ESP: the value of ESP after the increment is used. I verified Intel's docs for this with push $5 ; pop -8(%esp). That copied the dword 5 to the dword right below the one written by push when I single-stepped it in GDB on a Skylake CPU. If -8(%esp) address calculation had happened using ESP before that instruction executed, there would have been a 4-byte gap.

In the special case of pop %esp, yes that steps on the increment, simplifying to:

#pop %esp  # 3 uops on Skylake, 1 byte
   mov  (%esp), %esp             # 1 uop on Skylake.  3 bytes of machine-code size

Intel manuals have misleading pseudocode

Intel's pseudocode in the Operation sections of their instruction-set manual entries (SDM vol.2) do not accurately reflect the stack-pointer special cases. Only the extra paragraphs in the Description sections (quoted in @nrz's answer) get that right.

https://www.felixcloutier.com/x86/pop shows (for StackAddrSize = 32 and OperandSize = 32) a load into DEST and then incrementing ESP

     DEST ← SS:ESP; (* Copy a doubleword *)
     ESP ← ESP + 4;

But that's misleading for pop %esp because it implies that ESP += 4 happens after ESP = load(SS:ESP). Correct pseudo-code would use

 if ... operand size etc.
     TEMP ← SS:ESP; (* Copy a doubleword *)
     ESP ← ESP + 4;

 ..
 // after all the if / else size blocks:
 DEST ← TEMP

Intel gets this right for other instructions like pshufb where the pseudo-code starts out with TEMP ← DEST to snapshot the original state of the read-write destination operand.

Similarly, https://www.felixcloutier.com/x86/push#operation shows RSP being decremented first, not showing the src operand being snapshotted before that. Only the extra paragraphs in the text Description section correctly handle that special case.

AMD's manual Volume 3: General-Purpose and System Instructions (March 2021) is similarly wrong about this (my emphasis):

Copies the value pointed to by the stack pointer (SS:rSP) to the specified register or memory location and then increments the rSP by 2 for a 16-bit pop, 4 for a 32-bit pop, or 8 for a 64-bit pop.

Unlike Intel, it doesn't even document the special cases of popping into the stack pointer itself or with a memory operand involving rSP. At least not here, and a search on push rsp or push esp didn't find anything.

(AMD uses rSP to mean SP / ESP / RSP depending on current stack-size attribute selected by SS.)

AMD doesn't have a pseudocode section like Intel does, at least not for supposedly simple instructions like push/pop. (There is one for pusha.)

Footnote 1: That could even be what happens on some CPUs (although I don't think so). For example on Skylake, Agner Fog measured push %esp as 2 uops for the front-end vs. 1 micro-fused store for pushing any other register.

We do know that Intel CPUs do have some registers that get renamed like the architectural registers, but which are only accessible by microcode. e.g. https://blog.stuffedcow.net/2013/05/measuring-rob-capacity/ mentions "some extra architectural registers for internal use." So mov %esp, %temp / push %temp could in theory be how it decoded.

But a more likely explanation is that the extra measured uops in a long sequence of push %esp instructions are just stack-sync uops, like we get any time the OoO back-end explicitly reads ESP after a push/pop operation. e.g. push %eax / mov %esp, %edx would also cause a stack-sync uop. (The "stack engine" is what avoids needing an extra uop for the esp -= 4 part of push)

push %esp is sometimes useful, e.g. to push the address of some stack space you just reserved:

  sub   $8, %esp
  push  %esp
  push  $fmt         # "%lf"
  call  scanf
  movsd 8(%esp), %xmm0

  # add $8, %esp    # balance out the pushes at some point, or just keep using that allocated space for something.  Or clean it up just before returning along with the space for your local var.

pop %esp costs 3 uops on Skylake, one load (p23) and two ALU for any integer ALU port (2p0156). So it's even less efficient, but it has basically no use-cases. You can't usefully save/restore the stack pointer on the stack; if you know how to get to where you saved it, you can just restore it with add.

answered Oct 20 '22 09:10

Peter Cordes

Related questions
                            
                                x86 Assembly set of 'push'es and 'pusha' difference
                            
                                Does the BIOS copy the 512-byte bootloader to 0x7c00
                            
                                How are dw and dd different from db directives for strings?
                            
                                What's the easiest way to determine if a register's value is equal to zero or not?
                            
                                How to associate assembly code to exact line in C program?
                            
                                Syscall or sysenter on 32 bits Linux?
                            
                                Can I turn this into a loop through some 16-Bit Magic?
                            
                                lldb command to function-step / trace-step: continue until next function call or until current function is returned from
                            
                                Difference between "or eax,eax" and "test eax,eax" [duplicate]
                            
                                Understanding of MSVS C++ compiler optimizations
                            
                                C Disassembly to ARMv6: Meaning of Dot (.) Before a Label
                            
                                Packing BCD to DPD: How to improve this amd64 assembly routine?
                            
                                What is the significance of operations on the register EAX having their own opcodes?
                            
                                Which Intel microarchitecture introduced the ADC reg,0 single-uop special case?
                            
                                Should using MOV instruction to set SS to 0x0000 cause fault #GP(0) in 64-bit mode?
                            
                                Converting assembly code to C code [closed]
                            
                                Real-Mode x86 ASM: How are the Basics Done?
                            
                                sys_execve system call from Assembly
                            
                                Passing parameters and return values for a subroutine in assembly
                            
                                GDB complains No Source Available

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is an assembly-level representation of pushl/popl %esp?

Tags:

x86

assembly

stack-pointer

stack-memory

instruction-set

amorimluc

People also ask

2 Answers

nrz

`push` equivalents that work even in the ESP special cases

`pop` equivalents:

Intel manuals have misleading pseudocode

Peter Cordes

Recent Activity

Donate For Us

What is an assembly-level representation of pushl/popl %esp?

Tags:

x86

assembly

stack-pointer

stack-memory

instruction-set

amorimluc

People also ask

2 Answers

nrz

push equivalents that work even in the ESP special cases

pop equivalents:

Intel manuals have misleading pseudocode

Peter Cordes

Related questions

Recent Activity

Donate For Us

`push` equivalents that work even in the ESP special cases

`pop` equivalents: