Here are two functions which I claim do exactly the same thing:
bool fast(int x) { return x & 4242; } bool slow(int x) { return x && (x & 4242); }
Logically they do the same thing, and just to be 100% sure I wrote a test that ran all four billion possible inputs through both of them, and they matched. But the assembly code is a different story:
fast: andl $4242, %edi setne %al ret slow: xorl %eax, %eax testl %edi, %edi je .L3 andl $4242, %edi setne %al .L3: rep ret
I was surprised that GCC could not make the leap of logic to eliminate the redundant test. I tried g++ 4.4.3 and 4.7.2 with -O2, -O3, and -Os, all of which generated the same code. The platform is Linux x86_64.
Can someone explain why GCC shouldn't be smart enough to generate the same code in both cases? I'd also like to know if other compilers can do better.
Edit to add test harness:
#include <cstdlib> #include <vector> using namespace std; int main(int argc, char* argv[]) { // make vector filled with numbers starting from argv[1] int seed = atoi(argv[1]); vector<int> v(100000); for (int j = 0; j < 100000; ++j) v[j] = j + seed; // count how many times the function returns true int result = 0; for (int j = 0; j < 100000; ++j) for (int i : v) result += slow(i); // or fast(i), try both return result; }
I tested the above with clang 5.1 on Mac OS with -O3. It took 2.9 seconds using fast()
and 3.8 seconds using slow()
. If I instead use a vector of all zeros, there is no significant difference in performance between the two functions.
Compiler specific pragma gcc provides pragma GCC as a way to control temporarily the compiler behavior. By using pragma GCC optimize("O0") , the optimization level can be set to zero, which means absolutely no optimize for gcc.
GCC has a range of optimization levels, plus individual options to enable or disable particular optimizations. The overall compiler optimization level is controlled by the command line option -On, where n is the required optimization level, as follows: -O0 . (default).
GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. As compared to -O , this option increases both compilation time and the performance of the generated code.
The compiler optimizes to reduce the size of the binary instead of execution speed. If you do not specify an optimization option, gcc attempts to reduce the compilation time and to make debugging always yield the result expected from reading the source code.
Exactly why should it be able to optimize the code? You're assuming that any transformation that works will be done. That's not at all how optimizers work. They're not Artificial Intelligences. They simply work by parametrically replacing known patterns. E.g. the "Common Subexpression Elimination" scans an expression for common subexpressions, and moves them forwards, if that does not change side effects.
(BTW, CSE shows that optimizers are already quite aware of what code movement is allowed in the possible presence of side effects. They know that you have to be careful with &&
. Whether expr && expr
can be CSE-optimized or not depends on the side effects of expr
.)
So, in summary: which pattern do you think applies here?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With