I am trying to create a C++ class that behaves exactly like the inbuilt int
type with one exception: everywhere that operator* (or operator*=) is called, addition is called instead.
At first, the performance of my class was very poor (1/2 that of the inbuilt int
type), but I noticed this was because I forgot to include the copy constructor below:
struct AlmostInt {
AlmostInt () { }
AlmostInt (const AlmostInt &a) : val(a.val) { } // forgetting this killed
// performance
AlmostInt operator+(const AlmostInt &a) const { AlmostInt result = *this;
result.val += a.val;
return result; }
AlmostInt operator-(const AlmostInt &a) const { AlmostInt result = *this;
result.val -= a.val;
return result; }
AlmostInt operator*(const AlmostInt &a) const { AlmostInt result = *this;
result.val = result.val + a.val;
return result; }
AlmostInt &operator+=(const AlmostInt &a) { this->val += a.val;
return *this; }
AlmostInt &operator-=(const AlmostInt &a) { this->val -= a.val;
return *this; }
AlmostInt &operator*=(const AlmostInt &a) { this->val = this->val + a.val);
return *this; }
private:
int val;
};
Unfortunately, my program remains 25% slower than it should be. Examining the assembly generated for the two different versions of the program (one using int
, the other using AlmostInt
), I see that there is an identical number of + and - operations, so things are "working" at some level.
The problem is that there are significantly more load and store operations in the code using the AlmostInt
class and not the native int
operation.
Does anyone have any ideas on where this overhead might be coming from? The only guess
I had was that perhaps the compiler doesn't understand that AlmostInt
has all the
same properties int
does (e.g. associativity, commutativity), but if this were really
a problem, I would have expected a different number of '+' or '-' instructions in the code, and this doesn't happen.
I suspect that the additional loads and stores are related to extra stack activity, but all I can say at this point is it isn't merely a few extra stack loads and stores at the top and bottom of each function, but the extra loads and stores occur throughout the code.
Any ideas? I wonder if anyone can point me to a compiler that does allow
one to reach int
's level of performance with a custom class.
UPDATE:
Here is a simple function you can cut and paste to see what's going on for yourself. On x86-64 Linux (g++ 4.3, 4.4), AIX6 xlC and a couple of other platforms, changing the 'CHOOSE ONE...' lines below should lead to the same code being generated (or at least code of the same performance), but in practice the code bloats significantly. Can anyone explain what is going on (for any particular platform/compiler), or how to fix it?
class AlmostInt
{
int value;
public:
AlmostInt& operator+=(AlmostInt that)
{
value += that.value;
return *this;
}
AlmostInt& operator-=(AlmostInt that)
{
value -= that.value;
return *this;
}
AlmostInt& operator*=(AlmostInt that)
{
value *= that.value;
return *this;
}
};
AlmostInt operator+(AlmostInt lhs, AlmostInt rhs)
{
lhs += rhs;
return lhs;
}
AlmostInt operator-(AlmostInt lhs, AlmostInt rhs)
{
lhs -= rhs;
return lhs;
}
AlmostInt operator*(AlmostInt lhs, AlmostInt rhs)
{
lhs *= rhs;
return lhs;
}
// CHOOSE ONE OF THE FOLLOWING TWO LINES:
//typedef int real;
typedef AlmostInt real;
typedef struct {
real re;
real im;
} complex;
#define R(a0,a1,b0,b1,wre,wim) { \
t1 = a0 - a1; t2 = b0 - b1; \
t5 = t1 * wim; t6 = t2 * wim; \
t3 = a0; t1 *= wre; \
t3 += a1; t2 *= wre; \
t1 -= t6; t4 = b0; \
t2 += t5; t4 += b1; \
a0 = t3; b1 = t2; \
a1 = t4; b0 = t1; \
}
#define RZERO(a0,a1,b0,b1) { \
t1 = a0 - a1; t2 = b0 - b1; \
t3 = a0 + a1; t4 = b0 + b1; \
b0 = t1; a0 = t3; \
b1 = t2; a1 = t4; \
}
void rpass(real *a, const complex *w, unsigned int n)
{
real t1, t2, t3, t4, t5, t6, t7, t8;
real *b;
unsigned int k;
b = a + 4 * n;
k = n - 2;
RZERO(a[0],a[1],b[0],b[1]);
R(a[2],a[3],b[2],b[3],w[0].re,w[0].im);
R(a[4],a[5],b[4],b[5],w[1].re,w[1].im);
R(a[6],a[7],b[6],b[7],w[2].re,w[2].im);
for (;;) {
R(a[8],a[9],b[8],b[9],w[3].re,w[3].im);
R(a[10],a[11],b[10],b[11],w[4].re,w[4].im);
R(a[12],a[13],b[12],b[13],w[5].re,w[5].im);
R(a[14],a[15],b[14],b[15],w[6].re,w[6].im);
if (!(k -= 2)) break;
a += 8;
b += 8;
w += 4;
}
}
(Credit where credit's due: this little benchmark comes from the 'djbfft' library by Dan Bernstein)
One of most frequent reason for performance loss in these sort of cases
is returning values from functions. In theory, a compiler should be
able to optimize this, and do the same thing as if you returned an int
(provided that all relevant functions are inlined); in practice, all of
the compilers I know will return an int
in a register, but for a class
type, will pass an additional hidden argument with the address of a
temporary, and return the value in memory at this address. The reason
is that things like the copy constructor or assignment require an
address (the this pointer, the reference to what is being copied), and
the compiler doesn't seem to recognize that once it's inlined all of the
functions, the address won't be necessary any more. (There's also the
fact that the binary API says to do it this way, but the binary API
typically only concerns structures, not types with non-trivial
constructors, destructors and assignment operators.)
I would get rid of the constructors, replace call by reference-to-const with call by value (because the AlmostInt
objects are really small), and implement the non-modifying operators as free functions:
class AlmostInt
{
int value;
public:
AlmostInt& operator+=(AlmostInt that)
{
value += that.value;
return *this;
}
AlmostInt& operator-=(AlmostInt that)
{
value -= that.value;
return *this;
}
AlmostInt& operator*=(AlmostInt that)
{
value *= that.value;
return *this;
}
};
AlmostInt operator+(AlmostInt lhs, AlmostInt rhs)
{
lhs += rhs;
return lhs;
}
AlmostInt operator-(AlmostInt lhs, AlmostInt rhs)
{
lhs -= rhs;
return lhs;
}
AlmostInt operator*(AlmostInt lhs, AlmostInt rhs)
{
lhs *= rhs;
return lhs;
}
This should have the potential to get rid of some unnecessary overhead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With