Here's some code (full program follows later in the question):
template <typename T>
T fizzbuzz(T n) {
T count(0);
#if CONST
const T div(3);
#else
T div(3);
#endif
for (T i(0); i <= n; ++i) {
if (i % div == T(0)) count += i;
}
return count;
}
Now, if I call this template function with int
, then I get a factor of 6 performance difference according to whether I define CONST or not:
$ gcc --version
gcc (GCC) 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)
$ make -B wrappedint CPPFLAGS="-O3 -Wall -Werror -DWRAP=0 -DCONST=0" &&
time ./wrappedint
g++ -O3 -Wall -Werror -DWRAP=0 -DCONST=0 wrappedint.cpp -o wrappedi
nt
484573652
real 0m2.543s
user 0m2.059s
sys 0m0.046s
$ make -B wrappedint CPPFLAGS="-O3 -Wall -Werror -DWRAP=0 -DCONST=1" &&
time ./wrappedint
g++ -O3 -Wall -Werror -DWRAP=0 -DCONST=1 wrappedint.cpp -o wrappedi
nt
484573652
real 0m0.655s
user 0m0.327s
sys 0m0.046s
Examining the disassembly shows that in the fast (const) case, the modulo has been turned into a multiplication and shift type thing, whereas in the slow (non-const) case it's using idivl
.
Even worse, if I try to wrap my integer in a class, then the optimisation doesn't happen whether I use const or not. The code always uses idivl
and runs slow:
#include <iostream>
struct WrappedInt {
int v;
explicit WrappedInt(const int &val) : v(val) {}
bool operator<=(const WrappedInt &rhs) const { return v <= rhs.v; }
bool operator==(const WrappedInt &rhs) const { return v == rhs.v; }
WrappedInt &operator++() { ++v; return *this; }
WrappedInt &operator+=(const WrappedInt &rhs) { v += rhs.v; return *this; }
WrappedInt operator%(const WrappedInt &rhs) const
{ return WrappedInt(v%rhs.v); }
};
std::ostream &operator<<(std::ostream &s, WrappedInt w) {
return s << w.v;
}
template <typename T>
T fizzbuzz(T n) {
T count(0);
#if CONST
const T div(3);
#else
T div(3);
#endif
for (T i(0); i <= n; ++i) {
if (i % div == T(0)) count += i;
}
return count;
}
int main() {
#if WRAP
WrappedInt w(123456789);
std::cout << fizzbuzz(w) << "\n";
#else
std::cout << fizzbuzz<int>(123456789) << "\n";
#endif
}
My questions are:
1) Is there a simple principle of C++ itself, or gcc's optimisation, which explains why this happens, or is it just a case of "various heuristics run, this is the code you get"?
2) Is there any way to make the compiler realise that my locally-declared and never-referenced const WrappedInt can be treated as a compile-time const value? I want this thing to be a straight replacement for int in templates.
3) Is there a known way of wrapping an int such that the compiler can discard the wrapping when optimising? The goal is that WrappedInt will be a policy-based template. But if a "do-nothing" policy results in essentially arbitrary 6x speed penalties over int, I'm better off special-casing that situation and using int directly.
The -O level option to gcc turns on compiler optimization, when the specified value of level has the following effects: 0. The default reduces compilation time and has the effect that debugging always yields the expected result. This level is equivalent to not specifying the -O option at all.
Yes, compilers generate the most optimal code for such simplistic calculations.
GCC has a range of optimization levels, plus individual options to enable or disable particular optimizations. The overall compiler optimization level is controlled by the command line option -On, where n is the required optimization level, as follows: -O0 . (default). No optimization is performed.
GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. As compared to -O , this option increases both compilation time and the performance of the generated code.
I'm guessing its just the severely old GCC version you are running. The oldest compiler I have on my machine - gcc-4.1.2, performs the fast way with both the non-const and the wrap versions (and does so at only -O1).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With