I need to write to a file row by row of matrices and sparse matrices that appears in a list and I am doing something like this:
#include <RcppArmadillo.h>
// [[Rcpp::export]]
bool write_rows (Rcpp::List data, Rcpp::CharacterVector clss, int n) {
int len = data.length();
for(int i = 0; i<n; i++) {
for(int j=0; j<len; j++) {
if (clss[j] == "matrix") {
Rcpp::NumericMatrix x = data[j];
auto row = x.row(i);
// do something with row i
} else if (clss[j] == "dgCMatrix") {
arma::sp_mat x = data[j];
auto row = x.row(i);
// do something different with row i
}
}
}
return true;
}
This function can be called in R with:
data <- list(
x = Matrix::rsparsematrix(nrow = 1000, ncol = 1000, density = 0.3),
y = matrix(1:10000, nrow = 1000, ncol = 10)
)
clss <- c("dgCMatrix", "matrix")
write_rows(data, clss, 1000)
The function receives a list of matrices or sparse matrices with the same number of rows and writes those matrices row by row, ie. first writes first rows of all elements in data
then the second row of all elements and etc.
My problem is that it seems that this line arma::sp_mat x = data[i];
seems to have a huge impact in performance since it seems that I am implicitly casting the list element data[j]
to an Armadillo Sparse Matrix n
times.
My question is: is there anyway I could avoid this? Is there a more efficient solution? I tried to find a solution by looking into readr
's source code, since they also write list elements row by row, but they also do a cast for each row (in this line for example, but maybe this doesn't impact the performance because they deal with SEXPS?
With the clarification, it seems that the result should interleave the rows from each matrix. You can still do this while avoiding multiple conversions.
This is the original code, modified to generate some actual output:
// [[Rcpp::export]]
arma::mat write_rows(Rcpp::List data, Rcpp::CharacterVector clss, int nrows, int ncols) {
int len = data.length();
arma::mat result(nrows*len, ncols);
for (int i = 0, k = 0; i < nrows; i++) {
for (int j = 0; j < len; j++) {
arma::rowvec r;
if (clss[j] == "matrix") {
Rcpp::NumericMatrix x = data[j];
r = x.row(i);
}
else {
arma::sp_mat x = data[j];
r = x.row(i);
}
result.row(k++) = r;
}
}
return result;
}
The following code creates a vector of converted objects, and then extracts the rows from each object as required. The conversion is only done once per matrix. I use a struct containing a dense and sparse mat because it's a lot simpler than dealing with unions; and I don't want to drag in boost::variant
or require C++17. Since there's only 2 classes we want to deal with, the overhead is minimal.
struct Matrix_types {
arma::mat m;
arma::sp_mat M;
};
// [[Rcpp::export]]
arma::mat write_rows2(Rcpp::List data, Rcpp::CharacterVector clss, int nrows, int ncols) {
const int len = data.length();
std::vector<Matrix_types> matr(len);
std::vector<bool> is_dense(len);
arma::mat result(nrows*len, ncols);
// populate the structs
for (int j = 0; j < len; j++) {
is_dense[j] = (clss[j] == "matrix");
if (is_dense[j]) {
matr[j].m = Rcpp::as<arma::mat>(data[j]);
}
else {
matr[j].M = Rcpp::as<arma::sp_mat>(data[j]);
}
}
// populate the result
for (int i = 0, k = 0; i < nrows; i++) {
for (int j = 0; j < len; j++, k++) {
if (is_dense[j]) {
result.row(k) = matr[j].m.row(i);
}
else {
arma::rowvec r(matr[j].M.row(i));
result.row(k) = r;
}
}
}
return result;
}
Running on some test data:
data <- list(
a=Matrix(1.0, 1000, 1000, sparse=TRUE),
b=matrix(2.0, 1000, 1000),
c=Matrix(3.0, 1000, 1000, sparse=TRUE),
d=matrix(4.0, 1000, 1000)
)
system.time(z <- write_rows(data, sapply(data, class), 1000, 1000))
# user system elapsed
# 185.75 35.04 221.38
system.time(z2 <- write_rows2(data, sapply(data, class), 1000, 1000))
# user system elapsed
# 4.21 0.05 4.25
identical(z, z2)
# [1] TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With