I have read various optimization guides that claim ADD 1 is faster than using INC in x86. Is this really true?

On some micro-architectures, with some instruction streams, <code>INC</code> will incur a "partial flags update stall" (because it updates some of the flags while preserving the others). <code>ADD</code> sets the value of all of the flags, and so does not risk such a stall. <code>ADD</code> is not always faster than <code>INC</code>, but it is almost always at least as fast (there are a few corner cases on certain older micro-architectures, but they are exceedingly rare), and sometimes significantly faster. For more details, consult Intel's Optimization Reference Manual or Agner Fog's micro-architecture notes.

While it's not a definite answer. Write this C file: <pre class="prettyprint"><code>=== inc.c === #include <stdio.h> int main(int argc, char *argv[]) { for (int n = 0; n < 1000; n++) { printf("%d\n", n); } return 0; } </code></pre> Then run: <pre class="prettyprint"><code>clang -march=native -masm=intel -O3 -S -o inc.clang.s inc.c gcc -march=native -masm=intel -O3 -S -o inc.gcc.s inc.c </code></pre> Note the generated assembly code. Relevant clang output: <pre class="prettyprint"><code>mov esi, ebx call printf inc ebx cmp ebx, 1000 jne .LBB0_1 </code></pre> Relevant gcc output: <pre class="prettyprint"><code>mov edi, 1 inc ebx call __printf_chk cmp ebx, 1000 jne .L2 </code></pre> This proves that both clang's and gcc's authors thinks <code>INC</code> is the better choice over <code>ADD reg, 1</code> on modern architectures. What would that mean for your question? Well, I would trust their judgement over the guides you have read and conclude that <code>INC</code> is just as fast as <code>ADD</code> and that the one byte saved due to the shorter register encoding makes it preferable. Compiler authors are just people so they can be wrong, but it is unlikely. :) Some more experimentation shows me that if you don't use the <code>-march=native</code> option, then gcc will use <code>add ebx, 1</code> instead. Clang otoh, always likes inc best. My conclusion is that when you asked the question in 2012 <code>ADD</code> was sometimes preferable but now in the year 2016 you should always go with <code>INC</code>.

Is ADD 1 really faster than INC ? x86 [duplicate]

2 Answers

On some micro-architectures, with some instruction streams, INC will incur a "partial flags update stall" (because it updates some of the flags while preserving the others). ADD sets the value of all of the flags, and so does not risk such a stall.

ADD is not always faster than INC, but it is almost always at least as fast (there are a few corner cases on certain older micro-architectures, but they are exceedingly rare), and sometimes significantly faster.

For more details, consult Intel's Optimization Reference Manual or Agner Fog's micro-architecture notes.

answered Sep 18 '22 09:09

Stephen Canon

While it's not a definite answer. Write this C file:

=== inc.c ===
#include <stdio.h>
int main(int argc, char *argv[])
{
    for (int n = 0; n < 1000; n++) {
        printf("%d\n", n);
    }
    return 0;
}

Then run:

clang -march=native -masm=intel -O3 -S -o inc.clang.s inc.c
gcc -march=native -masm=intel -O3 -S -o inc.gcc.s inc.c

Note the generated assembly code. Relevant clang output:

mov     esi, ebx
call    printf
inc     ebx
cmp     ebx, 1000
jne     .LBB0_1

Relevant gcc output:

mov     edi, 1
inc     ebx
call    __printf_chk
cmp     ebx, 1000
jne     .L2

This proves that both clang's and gcc's authors thinks INC is the better choice over ADD reg, 1 on modern architectures.

What would that mean for your question? Well, I would trust their judgement over the guides you have read and conclude that INC is just as fast as ADD and that the one byte saved due to the shorter register encoding makes it preferable. Compiler authors are just people so they can be wrong, but it is unlikely. :)

Some more experimentation shows me that if you don't use the -march=native option, then gcc will use add ebx, 1 instead. Clang otoh, always likes inc best. My conclusion is that when you asked the question in 2012 ADD was sometimes preferable but now in the year 2016 you should always go with INC.

answered Sep 18 '22 09:09

Björn Lindqvist

Related questions
                            
                                C++ Optimize if/else condition
                            
                                Can I enable multidex in Android debug build only?
                            
                                How can I determine CodeIgniter speed?
                            
                                Difference between a byte array and MemoryStream
                            
                                Use of private constructor to prevent instantiation of class?
                            
                                why does F# inline cause 11x performance improvement
                            
                                Non HTTP response message: The target server failed to respond: Is my server failing to handle load
                            
                                Reference equality performance difference? ((object)obj1 == (object)obj2) vs. object.ReferenceEquals( obj1, obj2 )
                            
                                Best implementation for an isNumber(string) method
                            
                                Is checking Perl function arguments worth it?
                            
                                Most efficient way of erasing/deleting multiple std::vector elements while retaining original order?
                            
                                Performance of jQuery selectors vs local variables
                            
                                How to find runtime efficiency of a C++ code
                            
                                Where can I find the time and space complexity of the built-in sequence types in Python
                            
                                Profiling visualization tools?
                            
                                ElementName vs. RelativeResource?
                            
                                Waiting Threads Resource Consumption
                            
                                Quicksort sorts larger numbers faster?
                            
                                How efficient is Python's max function
                            
                                MySQL group-by very slow

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is ADD 1 really faster than INC ? x86 [duplicate]

Tags:

performance

optimization

x86

assembly

Tyler Durden

People also ask

2 Answers

Stephen Canon

Björn Lindqvist

Recent Activity

Donate For Us