long long r = 0;
long long k = 0;
for (; k < 9999999999999; k++)
{
for (long long i = 0; i < 9999999999999; i++)
{
for (long long j = 0; j < 9999999999999; j++)
{
r = (r + (i * j) % 100) % 47;
if (r != 0)
{
r++;
}
}
}
}
This code executes on i3Core in 0.000001 wall seconds, checked with boost::timer::auto_cpu_timer
on i7Core.
But with visual studio 2010 it seems to run in infinite time.
What is wrong with GCC or VS ? Is GCC optimizing too much?
Avoid nested loops with itertools. You can use itertools. product() to get all combinations of multiple lists in one loop and get the same result as nested loops. Since it is a single loop, you can simply break under the desired conditions. Adding the argument of itertools.
In a nested loop, a break statement only stops the loop it is placed in. Therefore, if a break is placed in the inner loop, the outer loop still continues. However, if the break is placed in the outer loop, all of the looping stops.
Nested iterations are not necessarily a bad thing. Even many well-known algorithms rely on them. But you have to be extremely cautious what you execute in the deepest loop, since these commands will be performed very often.
GCC has a range of optimization levels, plus individual options to enable or disable particular optimizations. The overall compiler optimization level is controlled by the command line option -On, where n is the required optimization level, as follows: -O0 . (default).
Yes, GCC is optimizing that code.
Specifically, it knows that you aren't using the result, so it's removing all of it.
(You're never using the variable r
.)
This is called Dead Code Elimination.
To prevent the compiler from optimizing it out, you'll need to use the result somehow. Try printing r
out at the end:
cout << r << endl;
However, I warn that you'll need to reduce the iteration counts, or it probably won't finish in your lifetime.
I just tested this in VS2010 x64. Looking at the assembly, it is clear that VS2010 is not able to optimize out the entire loop.
It goes to show that different compilers vary in their ability to optimize different things.
Related, and more in-depth: How does GCC optimize out an unused variable incremented inside a loop?
You are running undefined behaviour
Your code has undefined behavior at i * j
on most platforms because:
9999999999999 * 9999999999999 > 2^64
and long long
is 2^64
on most platforms as there is simply too little hardware support for 128-bit integers.
Thanks to GCC -O3 for pointing this out in a warning: it can suppose that UB does not happen and run the loop less times. See also: Why does this loop produce "warning: iteration 3u invokes undefined behavior" and output more than 4 lines?
So anything can happen.
Let's decompile it to see what is really happening
GCC 4.8:
gcc -c -g -std=c99 -O0 a.c
objudmp -S a.o
Output:
a.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <main>:
int main() {
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
long long r = 0;
4: 48 c7 45 e0 00 00 00 movq $0x0,-0x20(%rbp)
b: 00
long long k = 0;
c: 48 c7 45 e8 00 00 00 movq $0x0,-0x18(%rbp)
13: 00
for (; k < 9999999999999; k++)
14: e9 f8 00 00 00 jmpq 111 <main+0x111>
{
for (long long i = 0; i < 9999999999999; i++)
19: 48 c7 45 f0 00 00 00 movq $0x0,-0x10(%rbp)
20: 00
21: e9 d2 00 00 00 jmpq f8 <main+0xf8>
{
for (long long j = 0; j < 9999999999999; j++)
26: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp)
2d: 00
2e: e9 ac 00 00 00 jmpq df <main+0xdf>
{
r = (r + (i * j) % 100) % 47;
33: 48 8b 45 f0 mov -0x10(%rbp),%rax
37: 48 0f af 45 f8 imul -0x8(%rbp),%rax
3c: 48 89 c1 mov %rax,%rcx
3f: 48 ba 0b d7 a3 70 3d movabs $0xa3d70a3d70a3d70b,%rdx
46: 0a d7 a3
49: 48 89 c8 mov %rcx,%rax
4c: 48 f7 ea imul %rdx
4f: 48 8d 04 0a lea (%rdx,%rcx,1),%rax
53: 48 c1 f8 06 sar $0x6,%rax
57: 48 89 c2 mov %rax,%rdx
5a: 48 89 c8 mov %rcx,%rax
5d: 48 c1 f8 3f sar $0x3f,%rax
61: 48 29 c2 sub %rax,%rdx
64: 48 89 d0 mov %rdx,%rax
67: 48 c1 e0 02 shl $0x2,%rax
6b: 48 01 d0 add %rdx,%rax
6e: 48 8d 14 85 00 00 00 lea 0x0(,%rax,4),%rdx
75: 00
76: 48 01 d0 add %rdx,%rax
79: 48 c1 e0 02 shl $0x2,%rax
7d: 48 29 c1 sub %rax,%rcx
80: 48 89 ca mov %rcx,%rdx
83: 48 8b 45 e0 mov -0x20(%rbp),%rax
87: 48 8d 0c 02 lea (%rdx,%rax,1),%rcx
8b: 48 ba 99 5c 41 4c ae movabs $0x572620ae4c415c99,%rdx
92: 20 26 57
95: 48 89 c8 mov %rcx,%rax
98: 48 f7 ea imul %rdx
9b: 48 c1 fa 04 sar $0x4,%rdx
9f: 48 89 c8 mov %rcx,%rax
a2: 48 c1 f8 3f sar $0x3f,%rax
a6: 48 29 c2 sub %rax,%rdx
a9: 48 89 d0 mov %rdx,%rax
ac: 48 89 45 e0 mov %rax,-0x20(%rbp)
b0: 48 8b 55 e0 mov -0x20(%rbp),%rdx
b4: 48 89 d0 mov %rdx,%rax
b7: 48 01 c0 add %rax,%rax
ba: 48 01 d0 add %rdx,%rax
bd: 48 c1 e0 04 shl $0x4,%rax
c1: 48 29 d0 sub %rdx,%rax
c4: 48 29 c1 sub %rax,%rcx
c7: 48 89 c8 mov %rcx,%rax
ca: 48 89 45 e0 mov %rax,-0x20(%rbp)
if (r != 0)
ce: 48 83 7d e0 00 cmpq $0x0,-0x20(%rbp)
d3: 74 05 je da <main+0xda>
{
r++;
d5: 48 83 45 e0 01 addq $0x1,-0x20(%rbp)
long long k = 0;
for (; k < 9999999999999; k++)
{
for (long long i = 0; i < 9999999999999; i++)
{
for (long long j = 0; j < 9999999999999; j++)
da: 48 83 45 f8 01 addq $0x1,-0x8(%rbp)
df: 48 b8 fe 9f 72 4e 18 movabs $0x9184e729ffe,%rax
e6: 09 00 00
e9: 48 39 45 f8 cmp %rax,-0x8(%rbp)
ed: 0f 8e 40 ff ff ff jle 33 <main+0x33>
int main() {
long long r = 0;
long long k = 0;
for (; k < 9999999999999; k++)
{
for (long long i = 0; i < 9999999999999; i++)
f3: 48 83 45 f0 01 addq $0x1,-0x10(%rbp)
f8: 48 b8 fe 9f 72 4e 18 movabs $0x9184e729ffe,%rax
ff: 09 00 00
102: 48 39 45 f0 cmp %rax,-0x10(%rbp)
106: 0f 8e 1a ff ff ff jle 26 <main+0x26>
int main() {
long long r = 0;
long long k = 0;
for (; k < 9999999999999; k++)
10c: 48 83 45 e8 01 addq $0x1,-0x18(%rbp)
111: 48 b8 fe 9f 72 4e 18 movabs $0x9184e729ffe,%rax
118: 09 00 00
11b: 48 39 45 e8 cmp %rax,-0x18(%rbp)
11f: 0f 8e f4 fe ff ff jle 19 <main+0x19>
125: b8 00 00 00 00 mov $0x0,%eax
r++;
}
}
}
}
}
12a: 5d pop %rbp
12b: c3 retq
so all the loops seems present.
With -O3
:
0000000000000000 <main>:
0: 31 c0 xor %eax,%eax
2: c3 retq
so it was clearly optimized away. Nothing in the C standard prevents that optimization, so GCC does it.
If we add a printf("%lld", r);
at the end as mentioned by Mystical, it does take forever even with -O3
, and the generated optimized code is almost the same as the non optimized one.
For infinite loops it gets more interesting: Are compilers allowed to eliminate infinite loops?
How to prevent if from happening: How to prevent GCC from optimizing out a busy wait loop?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With