Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can assembly instructions contain multiplications in the "lea" instruction?

I am working on a very low level part of the application in which performance is critical.

While investigating the generated assembly, I noticed the following instruction:

lea eax,[edx*8+8]

I am used to seeing additions when using memory references (e.g. [edx+4]), but this is the first time I see a multiplication.

  • Does this mean that the x86 processor can perform simple multiplications in the lea instruction?
  • Does this multiplication have an impact on the number of cycles needed to execute the instruction?
  • Is the multiplication limited to powers of 2 (I would assume this is the case)?

Thanks in advance.

like image 689
Patrick Avatar asked May 09 '12 08:05

Patrick


2 Answers

To expand on my comment and to answer the rest of the question...

Yes, it's limited to powers of two. (2, 4, and 8 specifically) So no multiplier is needed since it's just a shift. The point of it is to quickly generate an address from an index variable and a pointer - where the datatype is a simple 2, 4, or 8 byte word. (Though it's often abused for other uses as well.)

As for the number of cycles that are needed: According to Agner Fog's tables it looks like the lea instruction is constant on some machines and variable on others.

On Sandy Bridge there's a 2-cycle penalty if it's "complex or rip relative". But it doesn't say what "complex" means... So we can only guess unless you do a benchmark.

like image 95
Mysticial Avatar answered Oct 06 '22 08:10

Mysticial


Actually, this is not something specific to the lea instruction.

This type of addressing is called Scaled Addressing Mode. The multiplication is achieved by a bit shift, which is trivial:

A Left Shift

You could do 'scaled addressing' with a mov too, for example (note that this is not the same operation, the only similarity is the fact that ebx*4 represents an address multiplication):

 mov edx, [esi+4*ebx]

(source: http://www.cs.virginia.edu/~evans/cs216/guides/x86.html#memory)

For a more complete listing, see this Intel document. Table 2-3 shows that a scaling of 2, 4, or 8 is allowed. Nothing else.

Latency (in terms of number of cycles): I don't think this should be affected at all. A shift is a matter of connections, and selecting between three possible shifts is the matter of 1 multiplexer worth of delay.

like image 28
ArjunShankar Avatar answered Oct 06 '22 09:10

ArjunShankar