Does gcc automatically "unroll" if-statements?

Tags:

Say I have a loop that looks like this:

for(int i = 0; i < 10000; i++) {
    /* Do something computationally expensive */
    if (i < 200 && !(i%20)) {
        /* Do something else */
    }
}

wherein some trivial task gets stuck behind an if-statement that only runs a handful of times. I've always heard that "if-statements in loops are slow!" So, in the hopes of (marginally) increased performance, I split the loops apart into:

for(int i = 0; i < 200; i++) {
    /* Do something computationally expensive */
    if (!(i%20)) {
        /* Do something else */
    }
}

for(int i = 200; i < 10000; i++) {
    /* Do something computationally expensive */
}

Will gcc (with the appropriate flags, like -O3) automatically break the one loop into two, or does it only unroll to decrease the number of iterations?

579

asked Feb 16 '11 14:02

Connor Glosser

1 Answers

Why not just disassemble the program and see for yourself? But here we go. This is the testprogram:

int main() {
    int sum = 0;
    int i;
    for(i = 0; i < 10000; i++) {
        if (i < 200 && !(i%20)) {
            sum += 0xC0DE;
        }
        sum += 0xCAFE;
    }
    printf("%d\n", sum);
    return 0;
}

and this is the interesting part of the disassembled code compiled with gcc 4.3.3 and -o3:

0x08048404 <main+20>:   xor    ebx,ebx
0x08048406 <main+22>:   push   ecx
0x08048407 <main+23>:   xor    ecx,ecx
0x08048409 <main+25>:   sub    esp,0xc
0x0804840c <main+28>:   lea    esi,[esi+eiz*1+0x0]
0x08048410 <main+32>:   cmp    ecx,0xc7
0x08048416 <main+38>:   jg     0x8048436 <main+70>
0x08048418 <main+40>:   mov    eax,ecx
0x0804841a <main+42>:   imul   esi
0x0804841c <main+44>:   mov    eax,ecx
0x0804841e <main+46>:   sar    eax,0x1f
0x08048421 <main+49>:   sar    edx,0x3
0x08048424 <main+52>:   sub    edx,eax
0x08048426 <main+54>:   lea    edx,[edx+edx*4]
0x08048429 <main+57>:   shl    edx,0x2
0x0804842c <main+60>:   cmp    ecx,edx
0x0804842e <main+62>:   jne    0x8048436 <main+70>
0x08048430 <main+64>:   add    ebx,0xc0de
0x08048436 <main+70>:   add    ecx,0x1
0x08048439 <main+73>:   add    ebx,0xcafe
0x0804843f <main+79>:   cmp    ecx,0x2710
0x08048445 <main+85>:   jne    0x8048410 <main+32>
0x08048447 <main+87>:   mov    DWORD PTR [esp+0x8],ebx
0x0804844b <main+91>:   mov    DWORD PTR [esp+0x4],0x8048530
0x08048453 <main+99>:   mov    DWORD PTR [esp],0x1
0x0804845a <main+106>:  call   0x8048308 <__printf_chk@plt>

So as we see, for this particular example, no it does not. We have only one loop starting at main+32 and ending at main+85. If you've got problems reading the assembly code ecx = i; ebx = sum.

But still your mileage may vary - who knows what heuristics are used for this particular case, so you'll have to compile the code you've got in mind and see how longer/more complicated computations influence the optimizer.

Though on any modern CPU the branch predictor will do pretty good on such easy code, so you won't see much performance losses in either case. What's the performance loss of maybe a handful mispredictions if your computation intense code needs billions of cycles?

answered Oct 23 '22 11:10

Voo

Related questions
                            
                                Best practices for object oriented patterns with strict aliasing and strict alignment in C
                            
                                Win10 broke printf function
                            
                                Why is Python's requests 10x faster than C's libcurl?
                            
                                C to Python via SWIG: can't get void** parameters to hold their value
                            
                                Python C-API Object Allocation
                            
                                Successive calls to recvfrom() loses data?
                            
                                How do you set the order of libraries in automake?
                            
                                Comparing floats in their bit representations
                            
                                What is the use of declaring anonymous structures within a structure?
                            
                                Algorithm for voice comparison
                            
                                PHP: How To Call Standard Library Functions
                            
                                numpy array C api
                            
                                visual c editor for kids
                            
                                Generating code at compile-time using scripts
                            
                                LZW compression/decompression under low memory conditions
                            
                                Adding members to a C-style struct and backwards compatibility
                            
                                Linux: how are .pc files used when linking against a shared library?
                            
                                UDP send behaviour after connect()
                            
                                Convert from IBM floating point to IEEE floating point standard and Vice Versa- In C#?
                            
                                Secure C coding practices

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Does gcc automatically "unroll" if-statements?

Tags:

c

optimization

gcc

unroll

Connor Glosser

People also ask

1 Answers

Voo

Recent Activity

Donate For Us