Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is ADD 1 really faster than INC ? x86 [duplicate]

I have read various optimization guides that claim ADD 1 is faster than using INC in x86. Is this really true?

like image 289
Tyler Durden Avatar asked Nov 14 '12 16:11

Tyler Durden


People also ask

What is the difference between INC and add instructions?

After all, both ADD and INC updates flag registers. The only difference is that INC doesn't update CF .

What does pop do x86?

The pop instruction removes the 4-byte data element from the top of the hardware-supported stack into the specified operand (i.e. register or memory location).

What does push do x86?

A push is a single instruction in x86, which does two things internally. Decrement the ESP register by the size of pushed value. Store the pushed value at current address of ESP register.

What does mov do?

The MOV instruction is the most important command in the 8086 because it moves data from one location to another. It also has the widest variety of parameters; so it the assembler programmer can use MOV effectively, the rest of the commands are easier to understand. MOV copies the data in the source to the destination.


2 Answers

On some micro-architectures, with some instruction streams, INC will incur a "partial flags update stall" (because it updates some of the flags while preserving the others). ADD sets the value of all of the flags, and so does not risk such a stall.

ADD is not always faster than INC, but it is almost always at least as fast (there are a few corner cases on certain older micro-architectures, but they are exceedingly rare), and sometimes significantly faster.

For more details, consult Intel's Optimization Reference Manual or Agner Fog's micro-architecture notes.

like image 55
Stephen Canon Avatar answered Sep 18 '22 09:09

Stephen Canon


While it's not a definite answer. Write this C file:

=== inc.c ===
#include <stdio.h>
int main(int argc, char *argv[])
{
    for (int n = 0; n < 1000; n++) {
        printf("%d\n", n);
    }
    return 0;
}

Then run:

clang -march=native -masm=intel -O3 -S -o inc.clang.s inc.c
gcc -march=native -masm=intel -O3 -S -o inc.gcc.s inc.c

Note the generated assembly code. Relevant clang output:

mov     esi, ebx
call    printf
inc     ebx
cmp     ebx, 1000
jne     .LBB0_1

Relevant gcc output:

mov     edi, 1
inc     ebx
call    __printf_chk
cmp     ebx, 1000
jne     .L2

This proves that both clang's and gcc's authors thinks INC is the better choice over ADD reg, 1 on modern architectures.

What would that mean for your question? Well, I would trust their judgement over the guides you have read and conclude that INC is just as fast as ADD and that the one byte saved due to the shorter register encoding makes it preferable. Compiler authors are just people so they can be wrong, but it is unlikely. :)

Some more experimentation shows me that if you don't use the -march=native option, then gcc will use add ebx, 1 instead. Clang otoh, always likes inc best. My conclusion is that when you asked the question in 2012 ADD was sometimes preferable but now in the year 2016 you should always go with INC.

like image 36
Björn Lindqvist Avatar answered Sep 18 '22 09:09

Björn Lindqvist