Some people write such code when they need a loop without a counter or with a n-1, ..., 0
counter:
while (i--) { ... }
A specific example:
volatile int sink;
void countdown_i_used() {
unsigned i = 1000;
while (i--) {
sink = i; // if i is unused, gcc optimizes it away and uses dec/jnz
}
}
On GCC 8.2 (on the Godbolt compiler explorer), it's compiled into
# gcc8.2 -O3 -march=haswell
.L2:
mov DWORD PTR sink[rip], eax
dec eax # with tune=generic, sub eax, 1
cmp eax, -1
jne .L2
On clang (https://godbolt.org/z/YxYZ95), if the counter is not used it turns into
if(i) do {...} while(--i);
but if used, like GCC it turns into
add esi, -1
cmp esi, -1
jnz lp
However, this seems to be a better idea:
sub esi, 1
jnc lp
Why don't these two compilers use this way?
Because the cmp
way is better? Or because they won't save space this way and they are almost the same speed?
Or do they just not consider this option?
Update: Even if I write code to use the carry way (Here I use add/jc
but it's same)
bool addcy(unsigned& a, unsigned b) {
unsigned last_a = a;
a+=b;
return last_a+b<last_a;
}
volatile unsigned sink;
void f() {
for (unsigned i=100; addcy(i, -1); ) {
sink = i;
}
}
compiler still compile it as checking equality to -1. However, if the 100 is replaced with an input, the JC code remain
Yes, this is a missed optimization. Intel Sandybridge-family can macro-fuse sub/jcc into a single uop, so sub/jnc saves code-size, x86 instructions, and uops on those CPUs.
On other CPUs (e.g. AMD which can only fuse test/cmp with jcc), this still saves code size so it's at least slightly better. It's not worse on anything.
It would be a good idea to report missed-optimization bugs on https://bugs.llvm.org and https://gcc.gnu.org/bugzilla/.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With