At Eigen version I use "true" fixed size matrices and vectors, better algorithm (LDLT versus LU at uBlas), it uses SIMD instructions internally. So, why it is slower than uBlas on following example?
I am sure, I am doing something wrong - Eigen MUST be faster, or at least comparable.
#include <boost/numeric/ublas/matrix.hpp>
#include <boost/numeric/ublas/vector.hpp>
#include <boost/numeric/ublas/lu.hpp>
#include <boost/numeric/ublas/symmetric.hpp>
#include <boost/progress.hpp>
#include <Eigen/Dense>
#include <iostream>
using namespace boost;
using namespace std;
const int n=9;
const int total=100000;
void test_ublas()
{
using namespace boost::numeric::ublas;
cout << "Boost.ublas ";
double r=1.0;
{
boost::progress_timer t;
for(int j=0;j!=total;++j)
{
//symmetric_matrix< double,lower,row_major,bounded_array<double,(1+n)*n/2> > A(n,n);
matrix<double,row_major,bounded_array<double,n*n> > A(n,n);
permutation_matrix< unsigned char,bounded_array<unsigned char,n> > P(n);
bounded_vector<double,n> v;
for(int i=0;i!=n;++i)
for(int k=0;k!=n;++k)
A(i,k)=0.0;
for(int i=0;i!=n;++i)
{
A(i,i)=1.0+i;
v[i]=i;
}
lu_factorize(A,P);
lu_substitute(A,P,v);
r+=inner_prod(v,v);
}
}
cout << r << endl;
}
void test_eigen()
{
using namespace Eigen;
cout << "Eigen ";
double r=1.0;
{
boost::progress_timer t;
for(int j=0;j!=total;++j)
{
Matrix<double,n,n> A;
Matrix<double,n,1> b;
for(int i=0;i!=n;++i)
{
A(i,i)=1.0+i;
b[i]=i;
}
Matrix<double,n,1> x=A.ldlt().solve(b);
r+=x.dot(x);
}
}
cout << r << endl;
}
int main()
{
test_ublas();
test_eigen();
}
Output is:
Boost.ublas 0.50 s
488184
Eigen 2.66 s
488184
(Visual Studio 2010 x64 Release)
EDIT:
For
const int n=4;
const int total=1000000;
Output is:
Boost.ublas 0.67 s
1.25695e+006
Eigen 0.40 s
5.4e+007
I guess, such behaviour is due to uBlas version computes factorization in-place, while Eigen version creates COPY of matrix (LDLT) - so it fits cache worse.
Is there any way to do inplace computation in Eigen? Or maybe there are other ways to improve it?
EDIT:
Following Fezvez advice and use LLT instead of LDLT I get:
Eigen 0.16 s
488184
It is good, but it still does unnecesary matrix stack allocation:
sizeof(A.llt()) == 656
I prefer to avoid it - it should be even faster.
EDIT:
I have removed allocation, by subclassing from LDLT (it's internal matrix is protected), and filling it directly. Now result for LDLT is:
Eigen 0.26 s
488209
It works, but it is workaround - not a real solution...
Subclassing from LLT also works, but does not provide such great effect.
You benchmark is not fair because the ublas version solve inplace, while the Eigen version could be easily adjusted to do so:
b=A.ldlt().solve(b);
r+=b.dot(b);
Compiling with g++-4.6 -O2 -DNDEBUG, I get (on a 2.3GHz CPU):
Boost.ublas 0.15 s
488184
Eigen 0.08 s
488184
Also make sure you compiled with optimization enabled, and SSE2 enabled if you are running a 32 bit system or (32 bits compilation chain).
EDIT: I also tried to avoid the matrix copy, but this results in zero gain at all.
Also, increasing n, increase the speed difference (here n=90):
Boost.ublas 0.47 s
Eigen 0.13 s
I followed your hint about in place computation:
Using the exact same code, and using the functions A.llt().solveInPlace(b);
and A.ldlt().solveInPlace(b);
(which replaces b by x), I got that
There were 100000 Eigen standard LDLT linear solvers applied in 12.658 seconds
There were 100000 Eigen standard LLT linear solvers applied in 4.652 seconds
There were 100000 Eigen in place LDLT linear solvers applied in 12.7 seconds
There were 100000 Eigen in place LLT linear solvers applied in 4.396 seconds
Perhaps an LLT solver is just more appropriate than a LDLT solver for this kind of problem? (I can see that you're dealing with your matrices of size 9)
(By the way, I answered a bit that previous question you had about your linear solver in dimension 9, and I am very sad to see that Eigen's implementation of LDLT has a lot of overhead...)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With