How to write a matrix matrix product that can compete with Eigen?

Question

Below is the C++ implementation comparing the time taken by Eigen and For Loop to perform matrix-matrix products. The For loop has been optimised to minimise cache misses. The for loop is faster than Eigen initially but then eventually becomes slower (upto a factor of 2 for 500 by 500 matrices). What else should I do to compete with Eigen? Is blocking the reason for the better Eigen performance? If so, how should I go about adding blocking to the for loop?

#include<iostream>
#include<Eigen/Dense>
#include<ctime>

int main(int argc, char* argv[]) {
    srand(time(NULL));
    //  Input the size of the matrix from the user
    int N   =   atoi(argv[1]);

    int M   =   N*N;

    //  The matrices stored as row-wise vectors
    double a[M];
    double b[M];
    double c[M];

    //  Initializing Eigen Matrices
    Eigen::MatrixXd a_E =   Eigen::MatrixXd::Random(N,N);
    Eigen::MatrixXd b_E =   Eigen::MatrixXd::Random(N,N);
    Eigen::MatrixXd c_E(N,N);

    double CPS  =   CLOCKS_PER_SEC;

    clock_t start, end;


    //  Matrix vector product by Eigen
    start   =   clock();
    c_E     =   a_E*b_E;
    end     =   clock();

    std::cout << "
Time taken by Eigen is: " << (end-start)/CPS << "
";

    //  Initializing For loop vectors
    int count   =   0;
    for (int j=0; j<N; ++j) {
        for (int k=0; k<N; ++k) {
            a[count]    =   a_E(j,k);
            b[count]    =   b_E(j,k);
            ++count;
        }
    }

    //  Matrix vector product by For loop
    start   =   clock();
    count   =   0;
    int count1, count2;
    for (int j=0; j<N; ++j) {
        count1  =   j*N;
        for (int k=0; k<N; ++k) {
            c[count]    =   a[count1]*b[k];
            ++count;
        }
    }

    for (int j=0; j<N; ++j) {
        count2  =   N;
        for (int l=1; l<N; ++l) {
            count   =   j*N;
            count1  =   count+l;
            for (int k=0; k<N; ++k) {
                c[count]+=a[count1]*b[count2];
                ++count;
                ++count2;
            }
        }
    }
    end     =   clock();

    std::cout << "
Time taken by for-loop is: " << (end-start)/CPS << "
";
}

ggael · Accepted Answer

Your code is already well vectorized by the compiler. The key for higher performance is hierarchical blocking to optimize the usage of registers, and of the different level of caches. Partial loop unrolling is also crucial to improve instruction pipelining. Reaching the performance of Eigen's product require a lot of effort and tuning.

It should also be noted that your benchmark is slightly biased and not fully reliable. Here is a more reliable version (you need complete Eigen's sources to get bench/BenchTimer.h):

#include<iostream>
#include<Eigen/Dense>
#include<bench/BenchTimer.h>

void myprod(double *c, const double* a, const double* b, int N) {
  int count = 0;
  int count1, count2;
  for (int j=0; j<N; ++j) {
      count1  =   j*N;
      for (int k=0; k<N; ++k) {
          c[count]    =   a[count1]*b[k];
          ++count;
      }
  }

  for (int j=0; j<N; ++j) {
      count2  =   N;
      for (int l=1; l<N; ++l) {
          count   =   j*N;
          count1  =   count+l;
          for (int k=0; k<N; ++k) {
              c[count]+=a[count1]*b[count2];
              ++count;
              ++count2;
          }
      }
  }
}

int main(int argc, char* argv[]) {
  int N = atoi(argv[1]);
  int tries = 4;
  int rep = std::max<int>(1,10000000/N/N/N);

  Eigen::MatrixXd a_E = Eigen::MatrixXd::Random(N,N);
  Eigen::MatrixXd b_E = Eigen::MatrixXd::Random(N,N);
  Eigen::MatrixXd c_E(N,N);

  Eigen::BenchTimer t1, t2;

  BENCH(t1, tries, rep, c_E.noalias() = a_E*b_E );
  BENCH(t2, tries, rep, myprod(c_E.data(), a_E.data(), b_E.data(), N));

  std::cout << "
Time taken by Eigen is: " << t1.best() << "
";
  std::cout << "
Time taken by for-loop is: " << t2.best() << "
";
}

Compiling with 3.3-beta1 and FMA enabled (-mfma), then the gap becomes much larger, almost one order of magnitude for N=2000.

How to write a matrix matrix product that can compete with Eigen?

Tags:

c++

matrix

matrix-multiplication

eigen

Adhvaitha

1 Answers

ggael

Recent Activity

Donate For Us

How to write a matrix matrix product that can compete with Eigen?

Tags:

c++

matrix

matrix-multiplication

eigen

Adhvaitha

1 Answers

ggael

Related questions

Recent Activity

Donate For Us