I will be working in R with a quite large (7 e6 x 4.5 e3) but very sparse matrix. So I am trying to understand how to deal with sparse matrices efficiently. I have two related questions.
First: I have been given to understand that the Matrix
package links automatically to the LAPACK and SuiteSparse compiled dlls. (I am working in Windows.) It was my impression that use of the SuiteSparse routines would shorten execution time compared with use of dense matrices using the LAPACK suite. But the test run below suggests that the run-time for the sparse version of the matrix is much slower than the dense version.
> library(Matrix)
> sparse <- sparseMatrix(1:4, 1:4, x=rnorm(4))
> dense <- as.matrix(sparse)
> x <- 1:4
> system.time(for (i in 1:10000) sparse %*% x)
user system elapsed
0.23 0.00 0.23
> system.time(for (i in 1:10000) dense %*% x)
user system elapsed
0 0 0
> system.time(for (i in 1:1000) solve(sparse))
user system elapsed
3.94 0.00 3.94
> system.time(for (i in 1:1000) solve(dense))
user system elapsed
0.05 0.00 0.05
a) Am I correct that Matrix
connects automatically with the above two compiled libraries? If not, how do I link to these DLLs?
b) Is the use of sparse matrix algebra in general actually that much slower than use of dense matrix algebra?
Second: I have installed the RcppEigen
and RcppArmadillo
packages. I have been able to compile a test program with RcppArmadillo
(using the paper by Dirk Eddelbuettel and Conrad Sanderson). But I have not, for the life of me, been able to find a similar introduction to RcppEigen
that would give me some model code that I could use to get started. Can any of you point to a document similar to the Eddelbuettel and Sanderson paper that can help me get started with RcppEigen
?
A matrix that has been compressed to eliminate zero-values is a sparse matrix, whereas a matrix with zero and nonzero values is a dense matrix.
The computational complexity of sparse matrix multiplication on AP is shown to be an O(nnz) where nnz is the number of nonzero elements. The AP is found to be especially efficient in binary sparse matrix multiplication.
Using sparse matrices to store data that contains a large number of zero-valued elements can both save a significant amount of memory and speed up the processing of that data. sparse is an attribute that you can assign to any two-dimensional MATLAB® matrix that is composed of double or logical elements.
Answer. Answer: The only advantage of using a sparse matrix is that, if your matrix is mainly composed by zero elements, you could save space memorising just the non-zero elements.
(A little too long for a comment.) I would start by profiling this for much larger matrices; I can imagine that the sparse algorithms are at a disadvantage when the matrix is small and not very sparse (e.g. in this case 25% of the cells are non-zero). In the example below (1000x1000 matrices), the sparse solver is 26 times faster than the dense solver. You may find that the Matrix
routines are fast enough for your purposes, without taking on the additional cognitive overhead of learning (Rcpp)Eigen
/(Rcpp)Armadillo
...
library(rbenchmark)
library(Matrix)
set.seed(101)
sparse <- sparseMatrix(1:1000,1:1000,x=rnorm(1000))
dense <- as.matrix(sparse)
benchmark(solve(sparse),solve(dense),replications=20,
columns = c(
"test", "replications", "elapsed", "relative", "user.self"))
## test replications elapsed relative user.self
## 2 solve(dense) 20 6.932 26.868 6.692
## 1 solve(sparse) 20 0.258 1.000 0.256
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With