Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to optimize a simple numeric type wrapper class in C++?

I am trying to implement a fixed-point class in C++, but I face problems with performance. I have reduced the problem to a simple wrapper of the float type and it is still slow. My question is - why is the compiler unable optimize it fully?

The 'float' version is 50% faster than 'Float'. Why?!

(I use Visual C++ 2008, all possible compiler's options tested, Release configuration of course).

See the code below:

#include <cstdio>
#include <cstdlib>
#include "Clock.h"      // just for measuring time

#define real Float      // Option 1
//#define real float        // Option 2

struct Float
{
private:
    float value;

public:
    Float(float value) : value(value) {}
    operator float() { return value; }

    Float& operator=(const Float& rhs)
    {
        value = rhs.value;
        return *this;
    }

    Float operator+ (const Float& rhs) const
    {
        return Float( value + rhs.value );
    }

    Float operator- (const Float& rhs) const
    {
        return Float( value - rhs.value );
    }

    Float operator* (const Float& rhs) const
    {
        return Float( value * rhs.value );
    }

    bool operator< (const Float& rhs) const
    {
        return value < rhs.value;
    }
};

struct Point
{
    Point() : x(0), y(0) {}
    Point(real x, real y) : x(x), y(y) {}

    real x;
    real y;
};

int main()
{
    // Generate data
    const int N = 30000;
    Point points[N];
    for (int i = 0; i < N; ++i)
    {
        points[i].x = (real)(640.0f * rand() / RAND_MAX);
        points[i].y = (real)(640.0f * rand() / RAND_MAX);
    }

    real limit( 20 * 20 );

    // Check how many pairs of points are closer than 20
    Clock clk;

    int count = 0;
    for (int i = 0; i < N; ++i)
    {
        for (int j = i + 1; j < N; ++j)
        {
            real dx = points[i].x - points[j].x;
            real dy = points[i].y - points[j].y;
            real d2 = dx * dx + dy * dy;
            if ( d2 < limit )
            {
                count++;
            }
        }
    }

    double time = clk.time();

    printf("%d\n", count);
    printf("TIME: %lf\n", time);

    return 0;
}
like image 253
Michal Czardybon Avatar asked Jul 19 '11 08:07

Michal Czardybon


2 Answers

IMO, It has to do with optimization flags. I checked your program in g++ linux-64 machine. Without any optimization, it give the same result as you told which 50% less.

With keeping the maximum optimization turned ON (i.e. -O4). Both versions are same. Turn on the optimization and check.

like image 191
iammilind Avatar answered Oct 22 '22 02:10

iammilind


Try not passing by reference. Your class is small enough that the overhead of passing it by reference (yes there is overhead if the compiler doesn't optimize it out), might be higher than just copying the class. So this...

Float operator+ (const Float& rhs) const
{
   return Float( value + rhs.value );
}

becomes something like this...

Float operator+ (Float rhs) const
{
   rhs.value+=value;
   return rhs;
}

which avoids a temporary object and may avoid some indirection of a pointer dereference.

like image 4
Skyler Saleh Avatar answered Oct 22 '22 04:10

Skyler Saleh