while(i--) optimization by gcc and clang: why don't they use sub / jnc?

Question

Some people write such code when they need a loop without a counter or with a n-1, ..., 0 counter:

while (i--) { ... }

A specific example:

volatile int sink;
void countdown_i_used() {
    unsigned i = 1000;
    while (i--) {
         sink = i;  // if i is unused, gcc optimizes it away and uses dec/jnz
    }
}

On GCC 8.2 (on the Godbolt compiler explorer), it's compiled into

# gcc8.2 -O3 -march=haswell
.L2:
    mov     DWORD PTR sink[rip], eax
    dec     eax                      # with tune=generic,  sub eax, 1
    cmp     eax, -1
    jne     .L2

On clang (https://godbolt.org/z/YxYZ95), if the counter is not used it turns into

if(i) do {...} while(--i);

but if used, like GCC it turns into

add esi, -1
cmp esi, -1
jnz lp

However, this seems to be a better idea:

sub esi, 1
jnc lp

Why don't these two compilers use this way?

Because the cmp way is better? Or because they won't save space this way and they are almost the same speed?

Or do they just not consider this option?

Update: Even if I write code to use the carry way (Here I use add/jc but it's same)

bool addcy(unsigned& a, unsigned b) {
    unsigned last_a = a;
    a+=b;
    return last_a+b<last_a;
}
volatile unsigned sink;
void f() {

    for (unsigned i=100; addcy(i, -1); ) {
        sink = i;
    }
}

compiler still compile it as checking equality to -1. However, if the 100 is replaced with an input, the JC code remain

Peter Cordes · Accepted Answer

Yes, this is a missed optimization. Intel Sandybridge-family can macro-fuse sub/jcc into a single uop, so sub/jnc saves code-size, x86 instructions, and uops on those CPUs.

On other CPUs (e.g. AMD which can only fuse test/cmp with jcc), this still saves code size so it's at least slightly better. It's not worse on anything.

It would be a good idea to report missed-optimization bugs on https://bugs.llvm.org and https://gcc.gnu.org/bugzilla/.

while(i--) optimization by gcc and clang: why don't they use sub / jnc?

Tags:

performance

c

x86

gcc

assembly

l4m2

1 Answers

Peter Cordes

Recent Activity

Donate For Us

while(i--) optimization by gcc and clang: why don't they use sub / jnc?

Tags:

performance

c

x86

gcc

assembly

l4m2

1 Answers

Peter Cordes

Related questions

Recent Activity

Donate For Us