The LEA (Load Effective Address) instruction is a way of obtaining the address which arises from any of the Intel processor's memory addressing modes. That is to say, if we have a data move like this: MOV EAX, <MEM-OPERAND> it moves the contents of the designated memory location into the target register.
The MOV instruction is a 'MOVe' effectively a copy of the address that the TABLE-ADDR label stands for. The LEA instruction is a 'Load Effective Address' which is an indirected instruction, which means that TABLE-ADDR points to a memory location at which the address to load is found.
LEA − Used to load the address of operand into the provided register. LES − Used to load ES register and other provided register from the memory.
The lea (load effective address) instruction is used to put a memory address into the destination.
One significant difference between LEA
and ADD
on x86 CPUs is the execution unit which actually performs the instruction. Modern x86 CPUs are superscalar and have multiple execution units that operate in parallel, with the pipeline feeding them somewhat like round-robin (bar stalls). Thing is, LEA
is processed by (one of) the unit(s) dealing with addressing (which happens at an early stage in the pipeline), while ADD
goes to the ALU(s) (arithmetic / logical unit), and late in the pipeline. That means a superscalar x86 CPU can concurrently execute a LEA
and an arithmetic/logical instruction.
The fact that LEA
goes through the address generation logic instead of the arithmetic units is also the reason why it used to be called "zero-clocks"; it takes no time to execute because address generation has already happened by the time it would be / is executed.
It's not free, since address generation is a step in the execution pipeline, but it's got no execution overhead. And it doesn't occupy a slot in the ALU pipeline(s).
Edit: To clarify, LEA
is not free. Even on CPUs that do not implement it via the arithmetic unit it takes time to execute due to instruction decode / dispatch / retire and/or other pipeline stages that all instructions go through. The time taken to do LEA
just occurs in a different stage of the pipeline for CPUs that implement it via address generation.
I'm smart enough to trust the compiler, so can anyone shed some light on which one is better?
Yes, a little. Firstly, I'm taking this from the following message: https://groups.google.com/group/bsdnt-devel/msg/23a48bb18571b9a6
In this message a developer optimises some assembly I wrote very badly to run crazily fast in Intel Core 2 processors. As a background to this project, it's a bsd bignum library which I and a few other developers have been involved in.
In this case, all that's being optimised is addition of two arrays that look like this: uint64_t* x, uint64_t* y
. Each "limb" or member of the array represents part of the bignum; the basic process is to iterate over it starting from the least significant limb, add the pair up and continue upwards, passing the carry (any overflow) up each time. adc
does this for you on a processor (it's not possible to access the carry flag from C I don't think).
In that piece of code, a combination of lea something, [something+1]
and jrcxz
are used, which are apparently more efficient than the jnz
/add something, size
pair we might previously have used. I'm not sure if this was discovered as a result of simply testing different instructions, however. You'd have to ask.
However, in a later message, it is measured on an AMD chip and does not perform so well.
I'm also given to understand different operations perform differently on different processors. I know, for example, the GMP project detect processors using cpuid
and pass in different assembly routines based on different architectures, e.g. core2
, nehalem
.
The question you have to ask yourself is does your compiler produce optimised output for your cpu architecture? The Intel compiler, for example, is known to do this, so it might be worth measuring performance and seeing what output it produces.
LEA isn't faster than ADD instruction the execution speed is the same.
But LEA sometimes offer more than ADD. If we need simple and fast addition/multiplication in combination with second register than LEA can speed-up program execution. From the other side the LEA doesn't affect to the CPU flags so there is no overflow detection possibility.
The main reason is next. As you can note if you look carefully at the x86, this ISA is two-address. Every instruction accepts at most two arguments. Thus, the semantic of operations is next:
DST = DST <operation> SRC
The LEA is a kind of hack instruction, because it is the SINGLE instruction in the x86 ISA which is actually three-address:
DST = SRC1 <operation> SRC2
It is a kind of hack instruction, because it reuses the arguments dispatcher circuit of x86 CPU for performing addition and shift.
Compilers use LEA because this intruction allows them to replace few intructions by single instruction in the cases when the content of summand registers is beneficial to preserve unchanged. Take a note, that in all cases when compiler uses LEA DST register differs from the SRC register or SRC argument exploits complex address calculation logic.
For example, it is almost impossible to find in the generated code such use case:
LEA EAX, [EAX ] // equivalent of NOP
LEA EAX, [ECX ] // equivalent of MOV EAX, ECX
LEA EAX, [EAX+12] // equivalent of ADD EAX, 12
but the next use cases are common:
LEA EAX, [ECX +12] // there is no single-instruction equivalent
LEA EAX, [ECX+EDX*4+12] // there is no single-instruction equivalent
LEA EDX, [ECX+EDX*4+12] // there is no single-instruction equivalent
Indeed, imagine the next scenario with assumption that value of EBP should be preserved for future use:
LEA EAX, [EBP+12]
LEA EDX, [EBP+48]
Just two instructions! But in the case of absence of LEA the code will be next
MOV EAX, EBP
MOV EDX, EBP
ADD EAX, 12
ADD EDX, 48
I believe that the benefit of LEA use should be evident now. You can try to replace this instruction
LEA EDX, [ECX+EDX*4+12] // there is no single-instruction equivalent
by ADD-based code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With