I would like to compare the speed of Matlab in matrix multiplication with the speed of Eigen 3 on an Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz. The code including Eigen:
#include <iostream>
#include "Eigen/Dense"
#include <chrono>
#include <omp.h>
using namespace std;
using namespace Eigen;
const int dim=100;
int main()
{
std::chrono::time_point<std::chrono::system_clock> start, end;
int n;
n = Eigen::nbThreads();
cout<<n<<"\n";
Matrix<double, Dynamic, Dynamic> m1(dim,dim);
Matrix<double, Dynamic, Dynamic> m2(dim,dim);
Matrix<double, Dynamic, Dynamic> m_res(dim,dim);
start = std::chrono::system_clock::now();
for (int i = 0 ; i <100000; ++i) {
m1.setRandom(dim,dim);
m2.setRandom(dim,dim);
m_res=m1*m2;
}
end = std::chrono::system_clock::now();
std::chrono::duration<double> elapsed_seconds = end-start;
std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";
return 0;
}
It is compiled with g++ -O3 -std=c++11 -fopenmp
and executed with OMP_NUM_THREADS=8 ./prog
.
In Matlab I'm using
function mat_test(N,dim)
%
% N: how many tests
% dim: dimension of the matrices
tic
parfor i=1:N
A = rand(dim);
B = rand(dim);
C = A*B;
end
toc
The result is: 9s for Matlab, 36s for Eigen. What am I doing wrong in the Eigen case? I can exclude the dynamic allocation of of the matrices. Also, only 3 threads are used instead of eight.
EDIT:
Maybe I didn't state it clearly enough: The task is to multiply 100000times double valued matrices of dim=100 which are randomly filled each time, not only once. Do it as fast as possible with Eigen. If Eigen cannot cope with Matlab, what choice would you suggest?
The problem is that it is slower than Matlab. It reports about 8 seconds on average. Compiled with -O3 and no debug symbols on Ubuntu 16.04 with g++ 6.4.
Because MATLAB is a programming language at first developed for numerical linear algebra (matrix manipulations), which has libraries especially developed for matrix multiplications. And now MATLAB can also use the GPUs (Graphics processing unit) for this additionally.
For operations involving complex expressions, Eigen is inherently faster than any BLAS implementation because it can handle and optimize a whole operation globally -- while BLAS forces the programmer to split complex operations into small steps that match the BLAS fixed-function API, which incurs inefficiency due to ...
Below is a better version of your code making a fair use of Eigen. To summarize:
setRandom()
outside the benchmarking loop. setRandom()
calls the system rand()
function which is rather slow..noalias()
to avoid the creation of a temporary (only makes sense when the right-hand-side is a product)-mavx
and -mfma
compiler options (about x3.5 speed up compared to SSE only)The code:
#include <iostream>
#include "Eigen/Dense"
#include <chrono>
using namespace std;
using namespace Eigen;
const int dim=100;
int main()
{
std::chrono::time_point<std::chrono::system_clock> start, end;
int n;
n = Eigen::nbThreads();
cout << n << "\n";
Matrix<double, Dynamic, Dynamic> m1(dim,dim);
Matrix<double, Dynamic, Dynamic> m2(dim,dim);
Matrix<double, Dynamic, Dynamic> m_res(dim,dim);
start = std::chrono::system_clock::now();
m1.setRandom();
m2.setRandom();
for (int i = 0 ; i <100000; ++i) {
m_res.noalias() = m1 * m2;
}
end = std::chrono::system_clock::now();
std::chrono::duration<double> elapsed_seconds = end-start;
std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";
return 0;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With