According to masm's macamd64.inc, rex_push_reg
,
...rex_push_reg must be used in lieu of push_reg when it appears as the first instruction in a function, as the calling standard dictates that functions must not begin with a single byte instruction.
I wasn't able to find any documentation expressing this, however. Is this true? Where is it documented? Why is this the case?
The operative portion of this claim seems to be "the calling standard"—which calling standard? The joke may be an old one, but it remains apt: the great thing about standards is there are so many to choose from.
In this case, since you're speaking of MASM, we can assume that the target platform is Windows, so the Windows 64-bit calling convention would be assumed, rather than something in the official AMD64 specification. However, like you, I can't find anything there that speaks to this requirement.
However, I think what this comment is referring to is Microsoft's internal standard designed to allow hot patching of system binaries. By "hot patching" is meant the ability to dynamically patch binaries in memory—e.g. to apply a system update—without the need to restart.
The minimum requirement for this to work is that there is room for a 2-byte short JMP
instruction to be patched in at the beginning of every function. (Note that a short jump only allows execution to be passed anywhere from −128 to +127 bytes from the current instruction pointer, but that's enough to branch to a long jump, which then branches to the patched function provided by the update. In practice, the long jump instruction is patched into the padding between functions.)
Therefore, a function cannot begin with a 1-byte instruction because then a hot patch could potentially result in the instruction pointer pointing into the middle of an instruction. (Think about multi-threading race conditions.) So the rule is, if you want to begin a function with a prologue instruction like PUSH RBP
that would normally be only 1 byte, you need to add a 1-byte REX prefix. This unnecessary REX prefix is ignored by the CPU and functions essentially as a 1-byte NOP.
In 32-bit builds, hot patching was provided for by the 2-byte instruction MOV EDI, EDI
. This copies the EDI
register to itself without affecting flags, so it is effectively a NOP.
For 32-bit builds, you have to specifically pass the /hotpatch
switch to the compiler to have it insert this instruction. However, on 64-bit builds, the compiler always acts as if /hotpatch
has been specified, so this requirement that the first instruction be 2 bytes in length effectively becomes part of the platform standard.
So, why make this complicated rule instead of just having the compiler insert a 2-byte NOP at the beginning of every function, like is done in 32-bit builds? Well, I can't say for certain, but I can speculate. One problem is that MOV EDI, EDI
is not a NOP on x64 because it implicitly zeroes the upper 32-bits of the RDI
register. You'd have to choose a different instruction as the NOP, and once you've done that, you might as well rethink the whole business. Second, there is a (slight) performance cost you pay for having that NOP there, and since most instructions in long mode are at least 2 bytes long, it hardly seems worth it to require a pointless NOP instruction when the instruction that would normally be there is sufficient with only a few exceptions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With