I'm trying to understand the x64
assembly optimization that is done by the compiler.
I compiled a small C++ project as Release
build with Visual Studio 2008 SP1
IDE on Windows 8.1.
And one of the lines contained the following assembly code:
B8 31 00 00 00 mov eax,31h
0F 1F 44 00 00 nop dword ptr [rax+rax]
And here's a screenshot:
As far as I know nop
by itself is do nothing
, but I've never seen it with an operand like that.
Can someone explain what does it do?
In a comment elsewhere on this page, Michael Petch points to a web page which describes the Intel x86 multi-byte NOP opcodes. The page has a table of useful information, but unfortunately the HTML is messed up so you can't read it. Here is some information from that page, plus that table presented a readable form:
Multi-Byte NOP
http://www.felixcloutier.com/x86/NOP.html
The one-byte NOP instruction is an alias mnemonic for the XCHG (E)AX, (E)AX instruction.The multi-byte NOP instruction performs no operation on supported processors and generates undefined opcode exception on processors that do not support the multi-byte NOP instruction.
The memory operand form of the instruction allows software to create a byte sequence of “no operation” as one instruction.
For situations where multiple-byte NOPs are needed, the recommended operations (32-bit mode
and 64-bit mode) are: [my edit: in 64-bit mode, writerax
instead ofeax
.]Length Assembly Byte Sequence ------- ------------------------------------------ -------------------------- 1 byte nop 90 2 bytes 66 nop 66 90 3 bytes nop dword ptr [eax] 0F 1F 00 4 bytes nop dword ptr [eax + 00h] 0F 1F 40 00 5 bytes nop dword ptr [eax + eax*1 + 00h] 0F 1F 44 00 00 6 bytes 66 nop word ptr [eax + eax*1 + 00h] 66 0F 1F 44 00 00 7 bytes nop dword ptr [eax + 00000000h] 0F 1F 80 00 00 00 00 8 bytes nop dword ptr [eax + eax*1 + 00000000h] 0F 1F 84 00 00 00 00 00 9 bytes 66 nop word ptr [eax + eax*1 + 00000000h] 66 0F 1F 84 00 00 00 00 00
Note that the technique for selecting the right byte sequence--and thus the desired total size--may differ according to which assembler you are using.
For example, the following two lines of assembly taken from the table are ostensibly similar:
nop dword ptr [eax + 00h]
nop dword ptr [eax + 00000000h]
These differ only in the number of leading zeros, and some assemblers may make it hard to disable their "helpful" feature of always encoding the shortest possible byte sequence, which could make the second expression inaccessible.
For the multi-byte NOP situation, you don't want this "help" because you need to make sure that you actually get the desired number of bytes. So the issue is how to specify an exact combination of mod and r/m bits that ends up with the desired disp size--but via instruction mnemonics alone. This topic is complex, and certainly beyond the scope of my knowledge, but Scaled Indexing, MOD+R/M and SIB might be a starting place.
Now as I know you were just thinking, if you find it difficult or impossible to coerce your assembler's cooperation via instruction mnemonics you can always just resort to db
("define bytes") as a simple no-fuss alternative which is, um, guaranteed to work.
As pointed out in the comments, it is a multi-byte NOP usually used to align the subsequent instruction to a 16-byte boundary, when that instruction is the first instruction in a loop.
Such alignment can help with instruction fetch bandwidth, because instruction fetch often happens in units of 16 bytes, so aligning the top of a loop gives the greatest chance that the decoding occurs without bottlenecks.
The importance of such alignment is arguably less important than it once was, with the introduction of the loop buffer and the uop cache which are less sensitive to alignment. In some cases this optimization may even be a pessimization, especially when the loop executes very few times.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With