Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why INC and ADD 1 have different performances? [duplicate]

I've read many times over the years that you should do XOR ax, ax because it is faster... or when programming in C use counter++ or counter+=1 because they would INC or ADD... Or that in the Netburst Pentium 4 the INC was slower than ADD 1 so the compiler had to be warned that your target was a Netburst so it would translate all var++ to ADD 1...

My question is: Why INC and ADD have different performances? Why for example INC was claimed to be slower on Netburst while faster than ADD in other processors?

like image 261
speeder Avatar asked Aug 28 '12 16:08

speeder


2 Answers

For the x86 architecture, INC updates on a subset of the condition codes, whereas ADD updates the entire set of condition codes. (Other architectures have different rules so this discussion may or may not apply).

So an INC instruction must wait for other previous instructions that update the condition code bits to finish, before it can modify that previous value to produce its final condition code result.

ADD can produce final condition code bits without regard to previous values of the condition codes, so it doesn't need to wait for previous instructions to finish computing their value of the condition codes.

Consequence: you can execute ADD in parallel with lots of other instructions, and INC with fewer other instructions. Thus, ADD appears to be faster in practice.

(I believe there is a similar issue with working with 8 bit registers (e.g., AL) in the context of full width registers (e.g., EAX), in that an AL update requires that previous EAX updates complete first).

I don't use INC or DEC in my high performance assembly code anymore. If you aren't ultrasensitive to execution times, then INC or DEC is just fine and can reduce the size of your instruction stream.

like image 90
Ira Baxter Avatar answered Nov 15 '22 23:11

Ira Baxter


The XOR ax, ax bit is, I gather a few years out of date, and assigning zero now beats it (so I'm told).

The C bit about counter++ rather than counter+=1 is a couple of decades out of date. Definitely.

The simple reason for the first one with assembly, is that all instructions will be translated into some sort of operation on the part of the CPU, and while the designers will always try to make everything as fast as possible, they'll do a better job with some than with others. It's not hard to imagine how an INC could be faster since it only has to deal with one register, though that's grossly over-simplifying (but I don't know much about these things, so over-simplify is all I can do on that part).

The C one though, is long ago nonsense. If we have a particular CPU where INC beats ADD, why on earth would the compiler designer not use INC instead of ADD, for both counter++ and counter+=1? Compilers do a lot of optimisations, and that sort of change is far from the most complicated.

like image 45
Jon Hanna Avatar answered Nov 16 '22 00:11

Jon Hanna