I'm doing a project in x86-64 NASM and came across the instruction:
mov rdi, rdi
in the output of a compiler my professor wrote.
I have searched all over but can't find mention of why this would be needed. Does it affect the flags or is it something clever that I don't understand?
To give some context it's present in a loop right before the same register is decremented with sub
.
The instruction mov rdi, rdi
is just an inefficient 3 byte NOP, equivalent to an actual NOP
instruction. Assembling it, it generates the byte combination
48 89 ff mov rdi, rdi
That can be considered as a NOP
because it does neither affect the flags nor the registers. The only architectural effect is to advance the program counter to the next instruction.
It's common to use (multi-byte) NOP
s to align the next instruction to a certain address, a popular example being an aligned jump target, especially at the top of a loop.
But in this case, it appears it's just an artifact of code-generation from a non-optimizing compiler, not being used for intentional padding.
It's inefficient compared to a true nop
because it won't be special-cased to run more cheaply. (Its microarchitectural effect is different on current CPUs). It adds a cycle of latency to the dependency chain through RDI, and uses an ALU execution unit. (Neither Intel nor AMD CPUs can "eliminate" mov same,same
and run it with zero latency in the register-rename stage, only between different architectural registers. mov rax,rdi
for example can be about as cheap as a nop
on IvyBridge+ and Ryzen, if you don't mind clobbering RAX.)
In your case, you should just remove it (instead of replacing it with 66 66 90
(short NOP with redundant operand-size prefixes) or 01 1F 00
(long NOP), because it's not being used for padding.
If a search took you to this Q&A but you have an instruction like mov edi, edi
in 64-bit code, that's unrelated. You're actually looking for any of the following Q&As:
It's not rare to find instructions doing this at the start of a function that takes an int
arg and uses it as an array index, even in optimized compiler output from mainstream compilers.
mov edi, edi ; zero-extend EDI into RDI
It would be more efficient to pick a different destination register to allow mov-elimination to work on modern Intel and AMD CPUs, like mov eax, edi
, but compilers often don't do this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With