I'm using the Cholesky module of Eigen 3 for solving a linear equation system. The Eigen documentation states, that using LDLT
instead of LLT
would be faster for this purpose, but my benchmarks show a different result.
I using the following code for benchmarking:
#include <iostream>
#include <chrono>
#include <Eigen/Core>
#include <Eigen/Cholesky>
using namespace std;
using namespace std::chrono;
using namespace Eigen;
int main()
{
MatrixXf cov = MatrixXf::Random(4200, 4200);
cov = (cov + cov.transpose()) + 1000 * MatrixXf::Identity(4200, 4200);
VectorXf b = VectorXf::Random(4200), r1, r2;
r1 = b;
LLT<MatrixXf> llt;
auto start = high_resolution_clock::now();
llt.compute(cov);
if (llt.info() != Success)
{
cout << "Error on LLT!" << endl;
return 1;
}
auto middle = high_resolution_clock::now();
llt.solveInPlace(r1);
auto stop = high_resolution_clock::now();
cout << "LLT decomposition & solving in " << duration_cast<milliseconds>(middle - start).count()
<< " + " << duration_cast<milliseconds>(stop - middle).count() << " ms." << endl;
r2 = b;
LDLT<MatrixXf> ldlt;
start = high_resolution_clock::now();
ldlt.compute(cov);
if (ldlt.info() != Success)
{
cout << "Error on LDLT!" << endl;
return 1;
}
middle = high_resolution_clock::now();
ldlt.solveInPlace(r2);
stop = high_resolution_clock::now();
cout << "LDLT decomposition & solving in " << duration_cast<milliseconds>(stop - start).count()
<< " + " << duration_cast<milliseconds>(stop - middle).count() << " ms." << endl;
cout << "Total result difference: " << (r2 - r1).cwiseAbs().sum() << endl;
return 0;
}
I've compiled it with g++ -std=c++11 -O2 -o llt.exe llt.cc
on Windows and this is what I get:
LLT decomposition & solving in 6515 + 15 ms.
LDLT decomposition & solving in 8562 + 15 ms.
Total result difference: 1.27354e-006
So, why is LDLT slower than LLT? Am I doing something wrong or do I missunderstand the documentation?
This sentence of the documentation is outdated. With a recent version of Eigen, LLT should be much faster than LDLT for quite large matrices because the LLT implementation leverage cache-friendly matrix-matrix operations, while the LDLT implementation involves pivoting and matrix-vector operations only. With the devel branch your example gives me:
LLT decomposition & solving in 380 + 4 ms.
LDLT decomposition & solving in 2746 + 4 ms.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With