I have read various optimization guides that claim ADD 1 is faster than using INC in x86. Is this really true?
After all, both ADD and INC updates flag registers. The only difference is that INC doesn't update CF .
The pop instruction removes the 4-byte data element from the top of the hardware-supported stack into the specified operand (i.e. register or memory location).
A push is a single instruction in x86, which does two things internally. Decrement the ESP register by the size of pushed value. Store the pushed value at current address of ESP register.
The MOV instruction is the most important command in the 8086 because it moves data from one location to another. It also has the widest variety of parameters; so it the assembler programmer can use MOV effectively, the rest of the commands are easier to understand. MOV copies the data in the source to the destination.
On some micro-architectures, with some instruction streams, INC
will incur a "partial flags update stall" (because it updates some of the flags while preserving the others). ADD
sets the value of all of the flags, and so does not risk such a stall.
ADD
is not always faster than INC
, but it is almost always at least as fast (there are a few corner cases on certain older micro-architectures, but they are exceedingly rare), and sometimes significantly faster.
For more details, consult Intel's Optimization Reference Manual or Agner Fog's micro-architecture notes.
While it's not a definite answer. Write this C file:
=== inc.c ===
#include <stdio.h>
int main(int argc, char *argv[])
{
for (int n = 0; n < 1000; n++) {
printf("%d\n", n);
}
return 0;
}
Then run:
clang -march=native -masm=intel -O3 -S -o inc.clang.s inc.c
gcc -march=native -masm=intel -O3 -S -o inc.gcc.s inc.c
Note the generated assembly code. Relevant clang output:
mov esi, ebx
call printf
inc ebx
cmp ebx, 1000
jne .LBB0_1
Relevant gcc output:
mov edi, 1
inc ebx
call __printf_chk
cmp ebx, 1000
jne .L2
This proves that both clang's and gcc's authors thinks INC
is the better choice over ADD reg, 1
on modern architectures.
What would that mean for your question? Well, I would trust their judgement over the guides you have read and conclude that INC
is just as fast as ADD
and that the one byte saved due to the shorter register encoding makes it preferable. Compiler authors are just people so they can be wrong, but it is unlikely. :)
Some more experimentation shows me that if you don't use the -march=native
option, then gcc will use add ebx, 1
instead. Clang otoh, always likes inc best. My conclusion is that when you asked the question in 2012 ADD
was sometimes preferable but now in the year 2016 you should always go with INC
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With