It currently seems to me that the only reason we have instructions like “Push” is to replace multiple MOV, and arithmetic instructions with a single instruction.
Is there anything “PUSH” does that cannot be accomplished by more primitive instructions?
Is “PUSH” just a single Mnemonic that compiles into multiple machine code instructions?
Push is a real machine instruction (https://www.felixcloutier.com/x86/push) not just an assembler macro / pseudo-instruction. For example, push rax
has a single-byte encoding of 0x50
.
But yes you can emulate it using other instructions like sub rsp, 8
and a mov
store. (This is normal for a CISC machine like x86!) e.g. see What is the function of the push / pop instructions used on registers in x86 assembly?
To emulate it exactly (without modifying flags), you use LEA instead of ADD/SUB.
lea rsp, [rsp-8]
mov qword [rsp], 123 ; push 123 in 64-bit mode
See also What is an assembly-level representation of pushl/popl %esp? for equivalent instructions that match the behaviour even for push rsp
/ pop rsp
, push/pop [rsp+16]
, as well as for any other operand (immediate, reg, or mem).
Is there anything “PUSH” does that cannot be accomplished by more primitive instructions?
Nothing significant beyond efficiency and code-size.
Single instructions are atomic wrt. interrupts - they either happen or they don't. This is normally totally irrelevant; asynchronous interrupts don't usually look at the stack / register contents of the code that got interrupted.
PUSH can get the job done in a single byte of machine code for pushing a single register, or 2 bytes for a small immediate. A multi-instruction sequence is much larger. The architect of 8086's ISA was very focused on making small code-size possible, so yes it's totally normal to have an instruction that replaces a couple longer instructions with one short one. e.g. we have not
instead of having to use xor reg, -1
, and inc
instead of add reg, 1
. (Although again those both have different FLAGS semantics, with NOT leaving flags untouched and INC/DEC leaving CF untouched.) Not to mention all of x86's other special-case encodings, like 1-byte encodings for xchg-with-[e/r]ax. See https://codegolf.stackexchange.com/questions/132981/tips-for-golfing-in-x86-x64-machine-code
Also efficiency: PUSH decodes to a single uop (in the fused domain) on Pentium-M and later CPUs, thanks to the stack engine that handles implicit uses of the stack pointer by instructions like push/pop and call/ret. 2 separate instructions of course decode to at least 2 uops. (Except the special case of macro-fusion of test/cmp + JCC).
On ancient P5 Pentium, emulating push with separate ALU and mov
instructions was actually a win - before PPro CPUs didn't know how to break down complex CISC instructions into separate uops, and complex instructions couldn't pair in P5's dual-issue in-order pipeline. (See Agner Fog's microarch guide.) The main benefit here was being able to mix in other instructions that could pair, and to only do one big sub
and then just the mov
stores instead of multiple changes to the stack pointer.
This also applies to early P6-family before the stack engine. GCC with -march=pentium3
for example will tend to avoid push
and just do one bigger adjustment to ESP.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With