When I compile and run this code with Clang (-O3
) or MSVC (/O2
)...
#include <stdio.h> #include <time.h> static int const N = 0x8000; int main() { clock_t const start = clock(); for (int i = 0; i < N; ++i) { int a[N]; // Never used outside of this block, but not optimized away for (int j = 0; j < N; ++j) { ++a[j]; // This is undefined behavior (due to possible // signed integer overflow), but Clang doesn't see it } } clock_t const finish = clock(); fprintf(stderr, "%u ms\n", static_cast<unsigned int>((finish - start) * 1000 / CLOCKS_PER_SEC)); return 0; }
... the loop doesn't get optimized away.
Furthermore, neither Clang 3.6 nor Visual C++ 2013 nor GCC 4.8.1 tells me that the variable is uninitialized!
Now I realize that the lack of an optimization isn't a bug per se, but I find this astonishing given how compilers are supposed to be pretty smart nowadays. This seems like such a simple piece of code that even liveness analysis techniques from a decade ago should be able to take care of optimizing away the variable a
and therefore the whole loop -- never mind the fact that incrementing the variable is already undefined behavior.
Yet only GCC is able to figure out that it's a no-op, and none of the compilers tells me that this is an uninitialized variable.
Why is this? What's preventing simple liveness analysis from telling the compiler that a
is unused? Moreover, why isn't the compiler detecting that a[j]
is uninitialized in the first place? Why can't the existing uninitialized-variable-detectors in all of those compilers catch this obvious error?
Compilers are free to optimize code so long as they can guarantee the semantics of the code are not changed.
Compiler optimization is generally implemented using a sequence of optimizing transformations, algorithms which take a program and transform it to produce a semantically equivalent output program that uses fewer resources or executes faster.
The C/C++ compiler compiles each source file separately and produces the corresponding object file. This means the compiler can only apply optimizations on a single source file rather than on the whole program. However, some important optimizations can be performed only by looking at the whole program.
The Zapcc is the fastest compiler in our compile test.
The undefined behavior is irrelevant here. Replacing the inner loop with:
for (int j = 1; j < N; ++j) { a[j-1] = a[j]; a[j] = j; }
... has the same effect, at least with Clang.
The issue is that the inner loop both loads from a[j]
(for some j
) and stores to a[j]
(for some j
). None of the stores can be removed, because the compiler believes they may be visible to later loads, and none of the loads can be removed, because their values are used (as input to the later stores). As a result, the loop still has side-effects on memory, so the compiler doesn't see that it can be deleted.
Contrary to n.m.'s answer, replacing int
with unsigned
does not make the problem go away. The code generated by Clang 3.4.1 using int
and using unsigned int
is identical.
It's an interesting issue with regards to optimizing. I would expect that in most cases, the compiler would treat each element of the array as an individual variable when doing dead code analysis. Ans 0x8000 make too many individual variables to track, so the compiler doesn't try. The fact that a[j]
doesn't always access the the same object could cause problems as well for the optimizer.
Obviously, different compilers use different heuristics; a compiler could treat the array as a single object, and detect that it never affected output (observable behavior). Some compilers may choose not to, however, on the grounds that typically, it's a lot of work for very little gain: how often would such optimizations be applicable in real code?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With