Why code with multiple nested loops can finish immediately on GCC but take forever on VS?

Tags:

long long r = 0;
long long k = 0;
for (; k < 9999999999999; k++) 
{
    for (long long i = 0; i < 9999999999999; i++) 
    {
        for (long long j = 0; j < 9999999999999; j++) 
        {
            r = (r + (i * j) % 100) % 47;
            if (r != 0) 
            {
                r++;
            }
        }
    }
 }

This code executes on i3Core in 0.000001 wall seconds, checked with boost::timer::auto_cpu_timer on i7Core.

But with visual studio 2010 it seems to run in infinite time.

What is wrong with GCC or VS ? Is GCC optimizing too much?

638

asked Mar 13 '12 06:03

2 Answers

Yes, GCC is optimizing that code.

Specifically, it knows that you aren't using the result, so it's removing all of it.
(You're never using the variable r.)

This is called Dead Code Elimination.

To prevent the compiler from optimizing it out, you'll need to use the result somehow. Try printing r out at the end:

cout << r << endl;

However, I warn that you'll need to reduce the iteration counts, or it probably won't finish in your lifetime.

I just tested this in VS2010 x64. Looking at the assembly, it is clear that VS2010 is not able to optimize out the entire loop.

It goes to show that different compilers vary in their ability to optimize different things.

Related, and more in-depth: How does GCC optimize out an unused variable incremented inside a loop?

183

answered Nov 14 '22 21:11

Mysticial

You are running undefined behaviour

Your code has undefined behavior at i * j on most platforms because:

9999999999999 * 9999999999999 > 2^64

and long long is 2^64 on most platforms as there is simply too little hardware support for 128-bit integers.

Thanks to GCC -O3 for pointing this out in a warning: it can suppose that UB does not happen and run the loop less times. See also: Why does this loop produce "warning: iteration 3u invokes undefined behavior" and output more than 4 lines?

So anything can happen.

Let's decompile it to see what is really happening

GCC 4.8:

gcc -c -g -std=c99 -O0 a.c
objudmp -S a.o

Output:

a.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <main>:
int main() {
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
    long long r = 0;
   4:   48 c7 45 e0 00 00 00    movq   $0x0,-0x20(%rbp)
   b:   00 
    long long k = 0;
   c:   48 c7 45 e8 00 00 00    movq   $0x0,-0x18(%rbp)
  13:   00 
    for (; k < 9999999999999; k++) 
  14:   e9 f8 00 00 00          jmpq   111 <main+0x111>
    {
        for (long long i = 0; i < 9999999999999; i++) 
  19:   48 c7 45 f0 00 00 00    movq   $0x0,-0x10(%rbp)
  20:   00 
  21:   e9 d2 00 00 00          jmpq   f8 <main+0xf8>
        {
            for (long long j = 0; j < 9999999999999; j++) 
  26:   48 c7 45 f8 00 00 00    movq   $0x0,-0x8(%rbp)
  2d:   00 
  2e:   e9 ac 00 00 00          jmpq   df <main+0xdf>
            {
                r = (r + (i * j) % 100) % 47;
  33:   48 8b 45 f0             mov    -0x10(%rbp),%rax
  37:   48 0f af 45 f8          imul   -0x8(%rbp),%rax
  3c:   48 89 c1                mov    %rax,%rcx
  3f:   48 ba 0b d7 a3 70 3d    movabs $0xa3d70a3d70a3d70b,%rdx
  46:   0a d7 a3 
  49:   48 89 c8                mov    %rcx,%rax
  4c:   48 f7 ea                imul   %rdx
  4f:   48 8d 04 0a             lea    (%rdx,%rcx,1),%rax
  53:   48 c1 f8 06             sar    $0x6,%rax
  57:   48 89 c2                mov    %rax,%rdx
  5a:   48 89 c8                mov    %rcx,%rax
  5d:   48 c1 f8 3f             sar    $0x3f,%rax
  61:   48 29 c2                sub    %rax,%rdx
  64:   48 89 d0                mov    %rdx,%rax
  67:   48 c1 e0 02             shl    $0x2,%rax
  6b:   48 01 d0                add    %rdx,%rax
  6e:   48 8d 14 85 00 00 00    lea    0x0(,%rax,4),%rdx
  75:   00 
  76:   48 01 d0                add    %rdx,%rax
  79:   48 c1 e0 02             shl    $0x2,%rax
  7d:   48 29 c1                sub    %rax,%rcx
  80:   48 89 ca                mov    %rcx,%rdx
  83:   48 8b 45 e0             mov    -0x20(%rbp),%rax
  87:   48 8d 0c 02             lea    (%rdx,%rax,1),%rcx
  8b:   48 ba 99 5c 41 4c ae    movabs $0x572620ae4c415c99,%rdx
  92:   20 26 57 
  95:   48 89 c8                mov    %rcx,%rax
  98:   48 f7 ea                imul   %rdx
  9b:   48 c1 fa 04             sar    $0x4,%rdx
  9f:   48 89 c8                mov    %rcx,%rax
  a2:   48 c1 f8 3f             sar    $0x3f,%rax
  a6:   48 29 c2                sub    %rax,%rdx
  a9:   48 89 d0                mov    %rdx,%rax
  ac:   48 89 45 e0             mov    %rax,-0x20(%rbp)
  b0:   48 8b 55 e0             mov    -0x20(%rbp),%rdx
  b4:   48 89 d0                mov    %rdx,%rax
  b7:   48 01 c0                add    %rax,%rax
  ba:   48 01 d0                add    %rdx,%rax
  bd:   48 c1 e0 04             shl    $0x4,%rax
  c1:   48 29 d0                sub    %rdx,%rax
  c4:   48 29 c1                sub    %rax,%rcx
  c7:   48 89 c8                mov    %rcx,%rax
  ca:   48 89 45 e0             mov    %rax,-0x20(%rbp)
                if (r != 0) 
  ce:   48 83 7d e0 00          cmpq   $0x0,-0x20(%rbp)
  d3:   74 05                   je     da <main+0xda>
                {
                    r++;
  d5:   48 83 45 e0 01          addq   $0x1,-0x20(%rbp)
    long long k = 0;
    for (; k < 9999999999999; k++) 
    {
        for (long long i = 0; i < 9999999999999; i++) 
        {
            for (long long j = 0; j < 9999999999999; j++) 
  da:   48 83 45 f8 01          addq   $0x1,-0x8(%rbp)
  df:   48 b8 fe 9f 72 4e 18    movabs $0x9184e729ffe,%rax
  e6:   09 00 00 
  e9:   48 39 45 f8             cmp    %rax,-0x8(%rbp)
  ed:   0f 8e 40 ff ff ff       jle    33 <main+0x33>
int main() {
    long long r = 0;
    long long k = 0;
    for (; k < 9999999999999; k++) 
    {
        for (long long i = 0; i < 9999999999999; i++) 
  f3:   48 83 45 f0 01          addq   $0x1,-0x10(%rbp)
  f8:   48 b8 fe 9f 72 4e 18    movabs $0x9184e729ffe,%rax
  ff:   09 00 00 
 102:   48 39 45 f0             cmp    %rax,-0x10(%rbp)
 106:   0f 8e 1a ff ff ff       jle    26 <main+0x26>
int main() {
    long long r = 0;
    long long k = 0;
    for (; k < 9999999999999; k++) 
 10c:   48 83 45 e8 01          addq   $0x1,-0x18(%rbp)
 111:   48 b8 fe 9f 72 4e 18    movabs $0x9184e729ffe,%rax
 118:   09 00 00 
 11b:   48 39 45 e8             cmp    %rax,-0x18(%rbp)
 11f:   0f 8e f4 fe ff ff       jle    19 <main+0x19>
 125:   b8 00 00 00 00          mov    $0x0,%eax
                    r++;
                }
            }
        }
    }
}
 12a:   5d                      pop    %rbp
 12b:   c3                      retq

so all the loops seems present.

With -O3:

0000000000000000 <main>:
   0:   31 c0                   xor    %eax,%eax
   2:   c3                      retq

so it was clearly optimized away. Nothing in the C standard prevents that optimization, so GCC does it.

If we add a printf("%lld", r); at the end as mentioned by Mystical, it does take forever even with -O3, and the generated optimized code is almost the same as the non optimized one.

For infinite loops it gets more interesting: Are compilers allowed to eliminate infinite loops?

How to prevent if from happening: How to prevent GCC from optimizing out a busy wait loop?

answered Nov 14 '22 20:11

Ciro Santilli 新疆再教育营六四事件法轮功郝海东

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why code with multiple nested loops can finish immediately on GCC but take forever on VS?

Tags:

c++

optimization

visual-studio

compiler-optimization

gcc

Denis Ermolin

People also ask

2 Answers

Mysticial

Ciro Santilli 新疆再教育营六四事件法轮功郝海东

Recent Activity

Donate For Us

Why code with multiple nested loops can finish immediately on GCC but take forever on VS?

Tags:

c++

optimization

visual-studio

compiler-optimization

gcc

Denis Ermolin

People also ask

2 Answers

Mysticial

Ciro Santilli 新疆再教育营六四事件法轮功郝海东

Related questions

Recent Activity

Donate For Us