can custom C++ classes replicate the performance of inbuilt types?

Question

I am trying to create a C++ class that behaves exactly like the inbuilt int type with one exception: everywhere that operator* (or operator*=) is called, addition is called instead.

At first, the performance of my class was very poor (1/2 that of the inbuilt int type), but I noticed this was because I forgot to include the copy constructor below:

struct AlmostInt {                                                                                                                                                                       

  AlmostInt () { }                
  AlmostInt (const AlmostInt  &a) : val(a.val) { }  // forgetting this killed
                                                    // performance

  AlmostInt operator+(const AlmostInt &a) const { AlmostInt result = *this;
                                          result.val += a.val;
                                          return result; }
  AlmostInt operator-(const AlmostInt &a) const { AlmostInt result = *this;
                                          result.val -= a.val;
                                          return result; }
  AlmostInt operator*(const AlmostInt &a) const { AlmostInt result = *this;
                                          result.val  = result.val + a.val;      
                                          return result; }
  AlmostInt &operator+=(const AlmostInt &a) { this->val += a.val;                           
                                              return *this; }
  AlmostInt &operator-=(const AlmostInt &a) { this->val -= a.val;        
                                              return *this; }
  AlmostInt &operator*=(const AlmostInt &a) { this->val = this->val + a.val);     
                                              return *this; }

private:
  int val;
};

Unfortunately, my program remains 25% slower than it should be. Examining the assembly generated for the two different versions of the program (one using int, the other using AlmostInt), I see that there is an identical number of + and - operations, so things are "working" at some level.

The problem is that there are significantly more load and store operations in the code using the AlmostInt class and not the native int operation.

Does anyone have any ideas on where this overhead might be coming from? The only guess I had was that perhaps the compiler doesn't understand that AlmostInt has all the same properties int does (e.g. associativity, commutativity), but if this were really a problem, I would have expected a different number of '+' or '-' instructions in the code, and this doesn't happen.

I suspect that the additional loads and stores are related to extra stack activity, but all I can say at this point is it isn't merely a few extra stack loads and stores at the top and bottom of each function, but the extra loads and stores occur throughout the code.

Any ideas? I wonder if anyone can point me to a compiler that does allow one to reach int's level of performance with a custom class.

UPDATE:

Here is a simple function you can cut and paste to see what's going on for yourself. On x86-64 Linux (g++ 4.3, 4.4), AIX6 xlC and a couple of other platforms, changing the 'CHOOSE ONE...' lines below should lead to the same code being generated (or at least code of the same performance), but in practice the code bloats significantly. Can anyone explain what is going on (for any particular platform/compiler), or how to fix it?

class AlmostInt
{
    int value;

public:

    AlmostInt& operator+=(AlmostInt that)
    {
        value += that.value;
        return *this;
    }

    AlmostInt& operator-=(AlmostInt that)
    {
        value -= that.value;
        return *this;
    }

        AlmostInt& operator*=(AlmostInt that)
    {
        value *= that.value;
        return *this;
    }
};

AlmostInt operator+(AlmostInt lhs, AlmostInt rhs)
{
    lhs += rhs;
    return lhs;
}

AlmostInt operator-(AlmostInt lhs, AlmostInt rhs)
{
    lhs -= rhs;
    return lhs;
}

AlmostInt operator*(AlmostInt lhs, AlmostInt rhs)
{
    lhs *= rhs;
    return lhs;
}

// CHOOSE ONE OF THE FOLLOWING TWO LINES:
//typedef int real;
typedef AlmostInt real;

typedef struct {
  real re;
  real im;
} complex;

#define R(a0,a1,b0,b1,wre,wim) { \
  t1 = a0 - a1;  t2 = b0 - b1; \
  t5 = t1 * wim; t6 = t2 * wim; \
  t3 = a0;  t1 *= wre; \
  t3 += a1; t2 *= wre; \
  t1 -= t6; t4 = b0; \
  t2 += t5; t4 += b1; \
  a0 = t3;  b1 = t2; \
  a1 = t4;  b0 = t1; \
}

#define RZERO(a0,a1,b0,b1) { \
  t1 = a0 - a1; t2 = b0 - b1; \
  t3 = a0 + a1; t4 = b0 + b1; \
  b0 = t1; a0 = t3; \
  b1 = t2; a1 = t4; \
}

void rpass(real *a, const complex *w, unsigned int n)
{
  real t1, t2, t3, t4, t5, t6, t7, t8;
  real *b;
  unsigned int k;

  b = a + 4 * n;
  k = n - 2;

  RZERO(a[0],a[1],b[0],b[1]);
  R(a[2],a[3],b[2],b[3],w[0].re,w[0].im);
  R(a[4],a[5],b[4],b[5],w[1].re,w[1].im);
  R(a[6],a[7],b[6],b[7],w[2].re,w[2].im);

  for (;;) {
    R(a[8],a[9],b[8],b[9],w[3].re,w[3].im);
    R(a[10],a[11],b[10],b[11],w[4].re,w[4].im);
    R(a[12],a[13],b[12],b[13],w[5].re,w[5].im);
    R(a[14],a[15],b[14],b[15],w[6].re,w[6].im);
    if (!(k -= 2)) break;
    a += 8;
    b += 8;
    w += 4;
  }
}

(Credit where credit's due: this little benchmark comes from the 'djbfft' library by Dan Bernstein)

James Kanze · Accepted Answer

One of most frequent reason for performance loss in these sort of cases is returning values from functions. In theory, a compiler should be able to optimize this, and do the same thing as if you returned an int (provided that all relevant functions are inlined); in practice, all of the compilers I know will return an int in a register, but for a class type, will pass an additional hidden argument with the address of a temporary, and return the value in memory at this address. The reason is that things like the copy constructor or assignment require an address (the this pointer, the reference to what is being copied), and the compiler doesn't seem to recognize that once it's inlined all of the functions, the address won't be necessary any more. (There's also the fact that the binary API says to do it this way, but the binary API typically only concerns structures, not types with non-trivial constructors, destructors and assignment operators.)

fredoverflow · Answer

I would get rid of the constructors, replace call by reference-to-const with call by value (because the AlmostInt objects are really small), and implement the non-modifying operators as free functions:

class AlmostInt
{
    int value;

public:

    AlmostInt& operator+=(AlmostInt that)
    {
        value += that.value;
        return *this;
    }

    AlmostInt& operator-=(AlmostInt that)
    {
        value -= that.value;
        return *this;
    }

    AlmostInt& operator*=(AlmostInt that)
    {
        value *= that.value;
        return *this;
    }
};

AlmostInt operator+(AlmostInt lhs, AlmostInt rhs)
{
    lhs += rhs;
    return lhs;
}

AlmostInt operator-(AlmostInt lhs, AlmostInt rhs)
{
    lhs -= rhs;
    return lhs;
}

AlmostInt operator*(AlmostInt lhs, AlmostInt rhs)
{
    lhs *= rhs;
    return lhs;
}

This should have the potential to get rid of some unnecessary overhead.

can custom C++ classes replicate the performance of inbuilt types?

Tags:

c++

stack

operator-overloading

embedded

Fumiyo Eda

2 Answers

James Kanze

fredoverflow

Recent Activity

Donate For Us

can custom C++ classes replicate the performance of inbuilt types?

Tags:

c++

stack

operator-overloading

embedded

Fumiyo Eda

2 Answers

James Kanze

fredoverflow

Related questions

Recent Activity

Donate For Us