Does nasm have any built-in way to emit long-nop (aka multi-byte nops) instructions of a given length?
The answer seems to be that no, out of the box, there is no official way to emit these long-nops in nasm1 out of the box.
So I just wrote my own macros for 1 to 9 bytes based on the recommended sequences from the Intel manuals2:
;; long-nop instructions: nopX inserts a nop of X bytes
;; see "Table 4-12. Recommended Multi-Byte Sequence of NOP Instruction" in
;; "Intel® 64 and IA-32 Architectures Software Developer’s Manual" (325383-061US)
%define nop1 nop ; just a nop, included for completeness
%define nop2 db 0x66, 0x90 ; 66 NOP
%define nop3 db 0x0F, 0x1F, 0x00 ; NOP DWORD ptr [EAX]
%define nop4 db 0x0F, 0x1F, 0x40, 0x00 ; NOP DWORD ptr [EAX + 00H]
%define nop5 db 0x0F, 0x1F, 0x44, 0x00, 0x00 ; NOP DWORD ptr [EAX + EAX*1 + 00H]
%define nop6 db 0x66, 0x0F, 0x1F, 0x44, 0x00, 0x00 ; 66 NOP DWORD ptr [EAX + EAX*1 + 00H]
%define nop7 db 0x0F, 0x1F, 0x80, 0x00, 0x00, 0x00, 0x00 ; NOP DWORD ptr [EAX + 00000000H]
%define nop8 db 0x0F, 0x1F, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 ; NOP DWORD ptr [EAX + EAX*1 + 00000000H]
%define nop9 db 0x66, 0x0F, 0x1F, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 ; 66 NOP DWORD ptr [EAX + EAX*1 + 00000000H]
I've also added these to the nasm-utils project, so that's one way to get them if you have the same need.
1Although as Jester points out, you can dig into the internals to find some macros used to implement the "smart align" feature.
2For the record, I believe these first appeared in the AMD manuals and that eventually Intel adopted the same recommended sequences.
Just quoting https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf page 124 (3-28) from December, 2017 :
3.5.1.10 Using NOPs
Code generators generate a no-operation (NOP) to align instructions. Examples of NOPs of different lengths in 32-bit mode are shown below:
1-byte: XCHG EAX, EAX
2-byte: 66 NOP
3-byte: LEA REG, 0 (REG) (8-bit displacement)
4-byte: NOP DWORD PTR [EAX + 0] (8-bit displacement)
5-byte: NOP DWORD PTR [EAX + EAX*1 + 0] (8-bit displacement)
6-byte: LEA REG, 0 (REG) (32-bit displacement)
7-byte: NOP DWORD PTR [EAX + 0] (32-bit displacement)
8-byte: NOP DWORD PTR [EAX + EAX*1 + 0] (32-bit displacement)
9-byte: NOP WORD PTR [EAX + EAX*1 + 0] (32-bit displacement)
These are all true NOPs, having no effect on the state of the machine except to advance the EIP.
Because NOPs require hardware resources to decode and execute, use the fewest number to achieve the desired padding.
The one byte NOP:[XCHG EAX,EAX] has special hardware support. Although it still consumes a µop and its accompanying resources, the dependence upon the old value of EAX is removed.
This µop can be executed at the earliest possible opportunity, reducing the number of outstanding instructions and is the lowest cost NOP.
The other NOPs have no special hardware support. Their input and output registers are interpreted by the hardware. Therefore, a code generator should arrange to use the register containing the oldest value as input, so that the NOP will dispatch and release RS resources at the earliest possible opportunity.
Try to observe the following NOP generation priority:
• Select the smallest number of NOPs and pseudo-NOPs to provide the desired padding.
• Select NOPs that are least likely to execute on slower execution unit clusters.
• Select the register arguments of NOPs to reduce dependencies.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With