Why Eigen is 5x slower than ublas on following example?

Question

At Eigen version I use "true" fixed size matrices and vectors, better algorithm (LDLT versus LU at uBlas), it uses SIMD instructions internally. So, why it is slower than uBlas on following example?

I am sure, I am doing something wrong - Eigen MUST be faster, or at least comparable.

#include <boost/numeric/ublas/matrix.hpp>
#include <boost/numeric/ublas/vector.hpp>
#include <boost/numeric/ublas/lu.hpp>
#include <boost/numeric/ublas/symmetric.hpp>
#include <boost/progress.hpp>
#include <Eigen/Dense>
#include <iostream>

using namespace boost;
using namespace std;
const int n=9;
const int total=100000;

void test_ublas()
{
    using namespace boost::numeric::ublas;
    cout << "Boost.ublas ";
    double r=1.0;
    {
        boost::progress_timer t;
        for(int j=0;j!=total;++j)
        {
            //symmetric_matrix< double,lower,row_major,bounded_array<double,(1+n)*n/2> > A(n,n);
            matrix<double,row_major,bounded_array<double,n*n> > A(n,n);
            permutation_matrix< unsigned char,bounded_array<unsigned char,n> > P(n);
            bounded_vector<double,n> v;
            for(int i=0;i!=n;++i)
                for(int k=0;k!=n;++k)
                    A(i,k)=0.0;
            for(int i=0;i!=n;++i)
            {
                A(i,i)=1.0+i;
                v[i]=i;
            }
            lu_factorize(A,P);
            lu_substitute(A,P,v);
            r+=inner_prod(v,v);
        }
    }
    cout << r << endl;
}

void test_eigen()
{
    using namespace Eigen;
    cout << "Eigen ";
    double r=1.0;
    {
        boost::progress_timer t;
        for(int j=0;j!=total;++j)
        {
            Matrix<double,n,n> A;
            Matrix<double,n,1> b;
            for(int i=0;i!=n;++i)
            {
                A(i,i)=1.0+i;
                b[i]=i;
            }
            Matrix<double,n,1> x=A.ldlt().solve(b);
            r+=x.dot(x);
        }
    }
    cout << r << endl;
}

int main()
{
    test_ublas();
    test_eigen();
}

Output is:

Boost.ublas 0.50 s

488184
Eigen 2.66 s

488184

(Visual Studio 2010 x64 Release)

EDIT:

For

const int n=4;
const int total=1000000;

Output is:

Boost.ublas 0.67 s

1.25695e+006
Eigen 0.40 s

5.4e+007

I guess, such behaviour is due to uBlas version computes factorization in-place, while Eigen version creates COPY of matrix (LDLT) - so it fits cache worse.

Is there any way to do inplace computation in Eigen? Or maybe there are other ways to improve it?

EDIT:

Following Fezvez advice and use LLT instead of LDLT I get:

Eigen 0.16 s

488184

It is good, but it still does unnecesary matrix stack allocation:

sizeof(A.llt()) == 656

I prefer to avoid it - it should be even faster.

EDIT:

I have removed allocation, by subclassing from LDLT (it's internal matrix is protected), and filling it directly. Now result for LDLT is:

Eigen 0.26 s
488209

It works, but it is workaround - not a real solution...

Subclassing from LLT also works, but does not provide such great effect.

ggael · Accepted Answer

You benchmark is not fair because the ublas version solve inplace, while the Eigen version could be easily adjusted to do so:

b=A.ldlt().solve(b);
r+=b.dot(b);

Compiling with g++-4.6 -O2 -DNDEBUG, I get (on a 2.3GHz CPU):

Boost.ublas 0.15 s
488184

Eigen 0.08 s
488184

Also make sure you compiled with optimization enabled, and SSE2 enabled if you are running a 32 bit system or (32 bits compilation chain).

EDIT: I also tried to avoid the matrix copy, but this results in zero gain at all.

Also, increasing n, increase the speed difference (here n=90):

Boost.ublas 0.47 s
Eigen 0.13 s

B. Decoster · Answer

I followed your hint about in place computation:

Using the exact same code, and using the functions A.llt().solveInPlace(b); and A.ldlt().solveInPlace(b); (which replaces b by x), I got that

There were  100000  Eigen standard LDLT linear solvers applied in  12.658  seconds 
There were  100000  Eigen standard LLT linear solvers applied in  4.652  seconds 

There were  100000  Eigen in place LDLT linear solvers applied in  12.7  seconds 
There were  100000  Eigen in place LLT linear solvers applied in  4.396  seconds

Perhaps an LLT solver is just more appropriate than a LDLT solver for this kind of problem? (I can see that you're dealing with your matrices of size 9)

(By the way, I answered a bit that previous question you had about your linear solver in dimension 9, and I am very sad to see that Eigen's implementation of LDLT has a lot of overhead...)

Why Eigen is 5x slower than ublas on following example?

Tags:

c++

algorithm

linear-algebra

eigen

ublas

qble

2 Answers

ggael

B. Decoster

Recent Activity

Donate For Us

Why Eigen is 5x slower than ublas on following example?

Tags:

c++

algorithm

linear-algebra

eigen

ublas

qble

2 Answers

ggael

B. Decoster

Related questions

Recent Activity

Donate For Us