Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why code with multiple nested loops can finish immediately on GCC but take forever on VS?

long long r = 0;
long long k = 0;
for (; k < 9999999999999; k++) 
{
    for (long long i = 0; i < 9999999999999; i++) 
    {
        for (long long j = 0; j < 9999999999999; j++) 
        {
            r = (r + (i * j) % 100) % 47;
            if (r != 0) 
            {
                r++;
            }
        }
    }
 }

This code executes on i3Core in 0.000001 wall seconds, checked with boost::timer::auto_cpu_timer on i7Core.

But with visual studio 2010 it seems to run in infinite time.

What is wrong with GCC or VS ? Is GCC optimizing too much?

like image 638
Denis Ermolin Avatar asked Mar 13 '12 06:03

Denis Ermolin


People also ask

How do you avoid multiple nested loops?

Avoid nested loops with itertools. You can use itertools. product() to get all combinations of multiple lists in one loop and get the same result as nested loops. Since it is a single loop, you can simply break under the desired conditions. Adding the argument of itertools.

Does break Statement break out of all loops?

In a nested loop, a break statement only stops the loop it is placed in. Therefore, if a break is placed in the inner loop, the outer loop still continues. However, if the break is placed in the outer loop, all of the looping stops.

Is a nested loop always bad?

Nested iterations are not necessarily a bad thing. Even many well-known algorithms rely on them. But you have to be extremely cautious what you execute in the deepest loop, since these commands will be performed very often.

Does GCC optimize by default?

GCC has a range of optimization levels, plus individual options to enable or disable particular optimizations. The overall compiler optimization level is controlled by the command line option -On, where n is the required optimization level, as follows: -O0 . (default).


2 Answers

Yes, GCC is optimizing that code.

Specifically, it knows that you aren't using the result, so it's removing all of it.
(You're never using the variable r.)

This is called Dead Code Elimination.

To prevent the compiler from optimizing it out, you'll need to use the result somehow. Try printing r out at the end:

cout << r << endl;

However, I warn that you'll need to reduce the iteration counts, or it probably won't finish in your lifetime.


I just tested this in VS2010 x64. Looking at the assembly, it is clear that VS2010 is not able to optimize out the entire loop.

It goes to show that different compilers vary in their ability to optimize different things.


Related, and more in-depth: How does GCC optimize out an unused variable incremented inside a loop?

like image 183
Mysticial Avatar answered Nov 14 '22 21:11

Mysticial


You are running undefined behaviour

Your code has undefined behavior at i * j on most platforms because:

9999999999999 * 9999999999999 > 2^64

and long long is 2^64 on most platforms as there is simply too little hardware support for 128-bit integers.

Thanks to GCC -O3 for pointing this out in a warning: it can suppose that UB does not happen and run the loop less times. See also: Why does this loop produce "warning: iteration 3u invokes undefined behavior" and output more than 4 lines?

So anything can happen.

Let's decompile it to see what is really happening

GCC 4.8:

gcc -c -g -std=c99 -O0 a.c
objudmp -S a.o

Output:

a.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <main>:
int main() {
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
    long long r = 0;
   4:   48 c7 45 e0 00 00 00    movq   $0x0,-0x20(%rbp)
   b:   00 
    long long k = 0;
   c:   48 c7 45 e8 00 00 00    movq   $0x0,-0x18(%rbp)
  13:   00 
    for (; k < 9999999999999; k++) 
  14:   e9 f8 00 00 00          jmpq   111 <main+0x111>
    {
        for (long long i = 0; i < 9999999999999; i++) 
  19:   48 c7 45 f0 00 00 00    movq   $0x0,-0x10(%rbp)
  20:   00 
  21:   e9 d2 00 00 00          jmpq   f8 <main+0xf8>
        {
            for (long long j = 0; j < 9999999999999; j++) 
  26:   48 c7 45 f8 00 00 00    movq   $0x0,-0x8(%rbp)
  2d:   00 
  2e:   e9 ac 00 00 00          jmpq   df <main+0xdf>
            {
                r = (r + (i * j) % 100) % 47;
  33:   48 8b 45 f0             mov    -0x10(%rbp),%rax
  37:   48 0f af 45 f8          imul   -0x8(%rbp),%rax
  3c:   48 89 c1                mov    %rax,%rcx
  3f:   48 ba 0b d7 a3 70 3d    movabs $0xa3d70a3d70a3d70b,%rdx
  46:   0a d7 a3 
  49:   48 89 c8                mov    %rcx,%rax
  4c:   48 f7 ea                imul   %rdx
  4f:   48 8d 04 0a             lea    (%rdx,%rcx,1),%rax
  53:   48 c1 f8 06             sar    $0x6,%rax
  57:   48 89 c2                mov    %rax,%rdx
  5a:   48 89 c8                mov    %rcx,%rax
  5d:   48 c1 f8 3f             sar    $0x3f,%rax
  61:   48 29 c2                sub    %rax,%rdx
  64:   48 89 d0                mov    %rdx,%rax
  67:   48 c1 e0 02             shl    $0x2,%rax
  6b:   48 01 d0                add    %rdx,%rax
  6e:   48 8d 14 85 00 00 00    lea    0x0(,%rax,4),%rdx
  75:   00 
  76:   48 01 d0                add    %rdx,%rax
  79:   48 c1 e0 02             shl    $0x2,%rax
  7d:   48 29 c1                sub    %rax,%rcx
  80:   48 89 ca                mov    %rcx,%rdx
  83:   48 8b 45 e0             mov    -0x20(%rbp),%rax
  87:   48 8d 0c 02             lea    (%rdx,%rax,1),%rcx
  8b:   48 ba 99 5c 41 4c ae    movabs $0x572620ae4c415c99,%rdx
  92:   20 26 57 
  95:   48 89 c8                mov    %rcx,%rax
  98:   48 f7 ea                imul   %rdx
  9b:   48 c1 fa 04             sar    $0x4,%rdx
  9f:   48 89 c8                mov    %rcx,%rax
  a2:   48 c1 f8 3f             sar    $0x3f,%rax
  a6:   48 29 c2                sub    %rax,%rdx
  a9:   48 89 d0                mov    %rdx,%rax
  ac:   48 89 45 e0             mov    %rax,-0x20(%rbp)
  b0:   48 8b 55 e0             mov    -0x20(%rbp),%rdx
  b4:   48 89 d0                mov    %rdx,%rax
  b7:   48 01 c0                add    %rax,%rax
  ba:   48 01 d0                add    %rdx,%rax
  bd:   48 c1 e0 04             shl    $0x4,%rax
  c1:   48 29 d0                sub    %rdx,%rax
  c4:   48 29 c1                sub    %rax,%rcx
  c7:   48 89 c8                mov    %rcx,%rax
  ca:   48 89 45 e0             mov    %rax,-0x20(%rbp)
                if (r != 0) 
  ce:   48 83 7d e0 00          cmpq   $0x0,-0x20(%rbp)
  d3:   74 05                   je     da <main+0xda>
                {
                    r++;
  d5:   48 83 45 e0 01          addq   $0x1,-0x20(%rbp)
    long long k = 0;
    for (; k < 9999999999999; k++) 
    {
        for (long long i = 0; i < 9999999999999; i++) 
        {
            for (long long j = 0; j < 9999999999999; j++) 
  da:   48 83 45 f8 01          addq   $0x1,-0x8(%rbp)
  df:   48 b8 fe 9f 72 4e 18    movabs $0x9184e729ffe,%rax
  e6:   09 00 00 
  e9:   48 39 45 f8             cmp    %rax,-0x8(%rbp)
  ed:   0f 8e 40 ff ff ff       jle    33 <main+0x33>
int main() {
    long long r = 0;
    long long k = 0;
    for (; k < 9999999999999; k++) 
    {
        for (long long i = 0; i < 9999999999999; i++) 
  f3:   48 83 45 f0 01          addq   $0x1,-0x10(%rbp)
  f8:   48 b8 fe 9f 72 4e 18    movabs $0x9184e729ffe,%rax
  ff:   09 00 00 
 102:   48 39 45 f0             cmp    %rax,-0x10(%rbp)
 106:   0f 8e 1a ff ff ff       jle    26 <main+0x26>
int main() {
    long long r = 0;
    long long k = 0;
    for (; k < 9999999999999; k++) 
 10c:   48 83 45 e8 01          addq   $0x1,-0x18(%rbp)
 111:   48 b8 fe 9f 72 4e 18    movabs $0x9184e729ffe,%rax
 118:   09 00 00 
 11b:   48 39 45 e8             cmp    %rax,-0x18(%rbp)
 11f:   0f 8e f4 fe ff ff       jle    19 <main+0x19>
 125:   b8 00 00 00 00          mov    $0x0,%eax
                    r++;
                }
            }
        }
    }
}
 12a:   5d                      pop    %rbp
 12b:   c3                      retq   

so all the loops seems present.

With -O3:

0000000000000000 <main>:
   0:   31 c0                   xor    %eax,%eax
   2:   c3                      retq

so it was clearly optimized away. Nothing in the C standard prevents that optimization, so GCC does it.

If we add a printf("%lld", r); at the end as mentioned by Mystical, it does take forever even with -O3, and the generated optimized code is almost the same as the non optimized one.

For infinite loops it gets more interesting: Are compilers allowed to eliminate infinite loops?

How to prevent if from happening: How to prevent GCC from optimizing out a busy wait loop?