Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Have different optimizations (plain, SSE, AVX) in the same executable with C/C++

I'm developing optimizations for my 3D calculations and I now have:

  • a "plain" version using the standard C language libraries,
  • an SSE optimized version that compiles using a preprocessor #define USE_SSE,
  • an AVX optimized version that compiles using a preprocessor #define USE_AVX

Is it possible to switch between the 3 versions without having to compile different executables (ex. having different library files and loading the "right" one dynamically, don't know if inline functions are "right" for that)? I'd consider also performances in having this kind of switch in the software.

like image 562
elvencode Avatar asked Jan 18 '13 21:01

elvencode


1 Answers

There are several solutions for this.

One is based on C++, where you'd create multiple classes - typically, you implement a interface class, and use a factory function to give you an object of the correct class.

e.g.

class Matrix
{
   virtual void Multiply(Matrix &result, Matrix& a, Matrix &b) = 0;
   ... 
};

class MatrixPlain : public Matrix
{
   void Multiply(Matrix &result, Matrix& a, Matrix &b);

};


void MatrixPlain::Multiply(...)
{
   ... implementation goes here...
}

class MatrixSSE: public Matrix
{
   void Multiply(Matrix &result, Matrix& a, Matrix &b);
}

void MatrixSSE::Multiply(...)
{
   ... implementation goes here...
}

... same thing for AVX... 

Matrix* factory()
{
    switch(type_of_math)
    {
       case PlainMath: 
          return new MatrixPlain;

       case SSEMath:
          return new MatrixSSE;

       case AVXMath:
          return new MatrixAVX;

       default:
          cerr << "Error, unknown type of math..." << endl;
          return NULL;
    }
}

Or, as suggested above, you can use shared libraries that have a common interface, and dynamically load the library that is right.

Of course, if you implement the Matrix base class as your "plain" class, you could do stepwise refinement and implementing only the parts you actually find is beneficial, and rely on the baseclass to implement the functions where performance isn't highly crticial.

Edit: You talk about inline, and I think you are looking at the wrong level of function if that is the case. You want fairly large functions that do something on quite a bit of data. Otherwise, all your effort will be spent on preparing the data into the right format, and then doing a few calculation instructions, and then putting the data back into memory.

I would also consider how you store your data. Are you storing sets of an array with X, Y, Z, W, or are you storing lots of X, lots of Y, lots of Z and lots of W in separate arrays [assuming we're doing 3D calculations]? Depending on how your calculation works, you may find that doing one or the other way will give you the best benefit.

I've done a fair bit of SSE and 3DNow! optimisations some years back, and the "trick" is often more about how you store the data so you can easily grab a "bundle" of the right kind of data in one go. If you have the data stored the wrong way, you will be wasting a lot of the time "swizzling data" (moving data from one way of storing to another).

like image 83
Mats Petersson Avatar answered Sep 21 '22 16:09

Mats Petersson