Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why isn't movl from memory to memory allowed?

I was wondering if this is allowed in assembly,

 movl (%edx) (%eax) 

I would have guessed that it access the memory in the first operand and puts in the memory of the second operand, something like *a = *b but I haven't seen any example dealing with such so I'm guessing it's not allowable. Also, I've been told that this isn't allowed

 leal %esi (%edi)

why is that? Lastly, are there other similar functions I should be aware that aren't allowed.

like image 999
nochillfam Avatar asked Jan 07 '23 04:01

nochillfam


1 Answers

The normal/efficient way to copy from memory to memory is to load into a temporary register. Pick one; you could even movl (%ecx), %ecx / movl %ecx, (%eax) if you don't still need the load address in a register after copying.

There are other ways like pushl (%ecx) / popl (%edx) or setting up RSI/ESI and RDS/EDI for movsd, but those are slower; usually better to just free up a temporary register even if it means reloading something later, or even storing/reloading some other less-frequently-used value.


Why x86 can't use two explicit memory operands for one instruction:

movl (mem), (mem)         # AT&T syntax
mov dword [eax], [ecx]    ; or the equivalent in Intel-syntax

Invalid because x86 machine code doesn't have an encoding for mov with two addresses. (In fact no x86 instruction can ever have two arbitrary addressing modes.)

It has mov r32, r/m32 and mov r/m32, r32. Reg-reg moves can be encoded using either the mov r32, r/m32 opcode or the mov r/m32, r32 opcode. Many other instructions also have two opcodes, one where the dest has to be a register, and one where the src has to be a register.

(And there are some specialized forms, like op r/m32, imm32, or for mov specifically, movabs r64, [64bit-absolute-address].)

See the x86 instruction set reference manual (HTML scrape; other links in the x86 tag wiki). I used Intel/NASM syntax here because that's what Intel's and AMD's reference manuals use.

Very few instructions can do a load and store to two different addresses, e.g. movs (string-move), and push/pop (mem) (What x86 instructions take two (or more) memory operands?). In all of those cases, at least one of the memory addresses is implicit (implied by the opcode), not an arbitrary choice that could be [eax] or [edi + esi*4 + 123] or whatever.

Many ALU instructions are available with a memory destination. This is a read-modify-write on a single memory location, using the same addressing mode for load and then store. This shows that the limit wasn't that 8086 couldn't load and store, it was a decoding complexity (and machine-code compactness / format) limitation.


There are no instructions that take two arbitrary effective-addresses (i.e. specified with a flexible addressing mode). movs has implicit source and dest operands, and push has an implicit dest (esp).

An x86 instruction has at most one ModRM byte, and a ModRM can only encode one reg/memory operand (2 bits for mode, 3 bits for base register), and another register-only operand (3 bits). With an escape code, ModRM can signal a SIB byte to encode base + scaled-index for the memory operand, but there's still only room to encode one memory operand.

As I mentioned above, the memory-source and memory-destination forms of the same instruction (asm source mnemonic) use two different opcodes. As far as the hardware is concerned, they are different instructions.


The reasons for this design choice are probably partly implementation complexity: If it's possible for a single instruction to need two results from an AGU (address-generation-unit), then the wiring has to be there to make that possible. Some of this complexity is in the decoders that figure out which instruction an opcode is, and parse the remaining bits / bytes to figure out what the operands are. Since no other instruction can have multiple r/m operands, it would cost extra transistors (silicon area) to support a way to encode two arbitrary addressing modes. Also for the logic that has to figure out how long an instruction is, so it knows where to start decoding the next one.

It also potentially gives an instruction five input dependencies (two-register addressing mode for the store address, same for the load address, and FLAGS if it's adc or sbb). But when 8086 / 80386 was being designed, superscalar / out-of-order / dependency tracking probably wasn't on the radar. 386 added a lot of new instructions, so a mem-to-mem encoding of mov could have been done, but wasn't. If 386 had started to forward results directly from ALU output to ALU input and stuff like that (to reduce latency compared to always committing results to the register file), then this reason would have been one of the reasons it wasn't implemented.

If it existed, Intel P6 would probably decode it to two separate uops, a load and a store. It certainly wouldn't make sense to introduce now, or any time after 1995 when P6 was designed and simpler instructions gained more of a speed advantage over complex ones. (See http://agner.org/optimize/ for stuff about making code run fast.)

I can't see this being very useful, anyway, at least not compared to the cost in code-density. If you want this, you're probably not making enough use of registers. Figure out how to process your data on the fly while copying, if possible. Of course, sometimes you just have to do a load and then a store, e.g. in a sort routine to swap the rest of a struct after comparing based on one member. Doing moves in larger blocks (e.g. using xmm registers) is a good idea.


   leal %esi, (%edi)

This is AT&T syntax, lea src, dst. So lea (%edi), %esi is an inefficient equivalent to mov %edi, %esi, but in the other order there are two problems:

First, registers don't have addresses. A bare %esi is not a valid effective-address, so not a valid source for lea

Second, lea's destination must be a register. There's no encoding where it takes a second effective-address to store the destination to memory.


You left out the , between the two operands, so that's a showstopper before you even get to restrictions on what the operands can be.
The rest of the answer only discusses the code after fixing that syntax error.

valid-asm.s:2: Error: number of operands mismatch for `lea'
like image 165
Peter Cordes Avatar answered May 11 '23 23:05

Peter Cordes