This is a rather theoretical question, but I'm quite interested in it and would be glad if someone has some expert knowledge on this which he or she is willing to share.
I have a matrix of floats with 2000 rows and 600 cols and want to subtract the mean of the columns from each row. I have tested the following two lines and compared their runtime:
MatrixXf centered = data.rowwise() - (data.colwise().sum() / data.cols());
MatrixXf centered = data.rowwise() - data.colwise().mean();
I thought, mean()
would not do something different from dividing the sum of each column by the number of rows, but while the execution of the first line takes 12.3 seconds on my computer, the second line finishes in 0.09 seconds.
I'm using Eigen version 3.2.6
, which currently is the latest version, and my matrices are stored in row-major order.
Does someone know something about the internals of Eigen
which could explain this huge performance difference?
Edit: I should add that data
in the code above actually is of type Eigen::Map< Eigen::MatrixXf<Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor> >
and maps Eigen's functionality to a raw buffer.
Edit 2: As suggested by GuyGreer, I'll provide some sample code to reproduce my findings:
#include <iostream>
#include <chrono>
#include <Eigen/Core>
using namespace std;
using namespace std::chrono;
using namespace Eigen;
int main(int argc, char * argv[])
{
MatrixXf data(10000, 1000), centered;
data.setRandom();
auto start = high_resolution_clock::now();
if (argc > 1)
centered = data.rowwise() - data.colwise().mean();
else
centered = data.rowwise() - (data.colwise().sum() / data.rows());
auto stop = high_resolution_clock::now();
cout << duration_cast<milliseconds>(stop - start).count() << " ms" << endl;
return 0;
}
Compile with:
g++ -O3 -std=c++11 -o test test.cc
Running the resulting program without arguments, so that is uses sum()
, takes 126 seconds on my machine, while running test 1
using mean()
only takes 0.03 seconds!
Edit 3: As it turned out (see comments), it is not sum()
which takes so long, but the division of the resulting vector by the number of rows. So the new question is: Why does Eigen take more than 2 minutes to divide a vector with 1000 columns by a single scalar?
Somehow, both the partial reduction (sum) and division are recomputed every time because some crucial information about the evaluation cost of the partial reduction are wrongly lost by operator/
... Explicitly evaluating the mean fixes the issue:
centered = data.rowwise() - (data.colwise().sum() / data.cols()).eval();
Of course, this evaluation should be done by Eigen for you, as fixed by the changeset 42ab43a. This fix will be part of the next 3.2.7 and 3.3 releases.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With