I am trying to create a symmetrical matrix from a lower triangular matrix in R.
In a previous Q & A (Convert upper triangular part of a matrix to symmetric matrix) user 李哲源 said that for large matrices this shouldn't be done in R and proposed a solution in C. But I don't understand C and have never used for example Rccp before so don't know how to interpret the answer. Yet it is obvious that the C code there generates random numbers (rnorm) which I don't want. I want to put in a square matrix and get out a symmetric matrix.
For a given square matrix A that has data in its lower triangle, how would I create a symmetric matrix efficiently in C and use it in R?
Quickly adapted from the dist2mat function in as.matrix on a distance object is extremely slow; how to make it faster?.
library(Rcpp)
cppFunction('NumericMatrix Mat2Sym(NumericMatrix A, bool up2lo, int bf) {
IntegerVector dim = A.attr("dim");
size_t n = (size_t)dim[0], m = (size_t)dim[1];
if (n != m) stop("A is not a square matrix!");
/* use pointers */
size_t j, i, jj, ni, nj;
double *A_jj, *A_ij, *A_ji, *col, *row, *end;
/* cache blocking factor */
size_t b = (size_t)bf;
/* copy lower triangular to upper triangular; cache blocking applied */
for (j = 0; j < n; j += b) {
nj = n - j; if (nj > b) nj = b;
/* diagonal block has size nj x nj */
A_jj = &A(j, j);
for (jj = nj - 1; jj > 0; jj--, A_jj += n + 1) {
/* copy a column segment to a row segment (or vise versa) */
col = A_jj + 1; row = A_jj + n;
for (end = col + jj; col < end; col++, row += n) {
if (up2lo) *col = *row; else *row = *col;
}
}
/* off-diagonal blocks */
for (i = j + nj; i < n; i += b) {
ni = n - i; if (ni > b) ni = b;
/* off-diagonal block has size ni x nj */
A_ij = &A(i, j); A_ji = &A(j, i);
for (jj = 0; jj < nj; jj++) {
/* copy a column segment to a row segment (or vise versa) */
col = A_ij + jj * n; row = A_ji + jj;
for (end = col + ni; col < end; col++, row += n) {
if (up2lo) *col = *row; else *row = *col;
}
}
}
}
return A;
}')
For a square matrix A, this function Mat2Sym copies its lower triangular part (with transposition) into its upper triangular part to make it symmetric if up2lo = FALSE, and vise versa if up2lo = TRUE.
Note that the function overwrites A without using extra memory. To retain the input matrix and create a new output matrix, pass A + 0 not A into the function.
## an arbitrary asymmetric square matrix
set.seed(0)
A <- matrix(runif(25), 5)
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0.8966972 0.2016819 0.06178627 0.7698414 0.7774452
#[2,] 0.2655087 0.8983897 0.20597457 0.4976992 0.9347052
#[3,] 0.3721239 0.9446753 0.17655675 0.7176185 0.2121425
#[4,] 0.5728534 0.6607978 0.68702285 0.9919061 0.6516738
#[5,] 0.9082078 0.6291140 0.38410372 0.3800352 0.1255551
## lower triangular to upper triangular; don't overwrite
B <- Mat2Sym(A + 0, up2lo = FALSE, 128)
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0.8966972 0.2655087 0.3721239 0.5728534 0.9082078
#[2,] 0.2655087 0.8983897 0.9446753 0.6607978 0.6291140
#[3,] 0.3721239 0.9446753 0.1765568 0.6870228 0.3841037
#[4,] 0.5728534 0.6607978 0.6870228 0.9919061 0.3800352
#[5,] 0.9082078 0.6291140 0.3841037 0.3800352 0.1255551
## A is unchanged
A
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0.8966972 0.2016819 0.06178627 0.7698414 0.7774452
#[2,] 0.2655087 0.8983897 0.20597457 0.4976992 0.9347052
#[3,] 0.3721239 0.9446753 0.17655675 0.7176185 0.2121425
#[4,] 0.5728534 0.6607978 0.68702285 0.9919061 0.6516738
#[5,] 0.9082078 0.6291140 0.38410372 0.3800352 0.1255551
## upper triangular to lower triangular; overwrite
D <- Mat2Sym(A, up2lo = TRUE, 128)
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0.89669720 0.2016819 0.06178627 0.7698414 0.7774452
#[2,] 0.20168193 0.8983897 0.20597457 0.4976992 0.9347052
#[3,] 0.06178627 0.2059746 0.17655675 0.7176185 0.2121425
#[4,] 0.76984142 0.4976992 0.71761851 0.9919061 0.6516738
#[5,] 0.77744522 0.9347052 0.21214252 0.6516738 0.1255551
## A has been changed; D and A are aliased in memory
A
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0.89669720 0.2016819 0.06178627 0.7698414 0.7774452
#[2,] 0.20168193 0.8983897 0.20597457 0.4976992 0.9347052
#[3,] 0.06178627 0.2059746 0.17655675 0.7176185 0.2121425
#[4,] 0.76984142 0.4976992 0.71761851 0.9919061 0.6516738
#[5,] 0.77744522 0.9347052 0.21214252 0.6516738 0.1255551
On the use of Matrix package
Matrix is particularly useful for sparse matrices. For compatibility it also defines some classes like "dgeMatrix", "dtrMatrix", "dtpMatrix", "dsyMatrix", "dspMatrix" for dense matrices.
Given a square matrix A, the Matrix way to make it symmetric is the following.
set.seed(0)
A <- matrix(runif(25), 5)
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0.8966972 0.2016819 0.06178627 0.7698414 0.7774452
#[2,] 0.2655087 0.8983897 0.20597457 0.4976992 0.9347052
#[3,] 0.3721239 0.9446753 0.17655675 0.7176185 0.2121425
#[4,] 0.5728534 0.6607978 0.68702285 0.9919061 0.6516738
#[5,] 0.9082078 0.6291140 0.38410372 0.3800352 0.1255551
## equivalent to: Mat2Sym(A + 0, TRUE, 128)
new("dsyMatrix", x = base::c(A), Dim = dim(A), uplo = "U")
#5 x 5 Matrix of class "dsyMatrix"
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0.89669720 0.2016819 0.06178627 0.7698414 0.7774452
#[2,] 0.20168193 0.8983897 0.20597457 0.4976992 0.9347052
#[3,] 0.06178627 0.2059746 0.17655675 0.7176185 0.2121425
#[4,] 0.76984142 0.4976992 0.71761851 0.9919061 0.6516738
#[5,] 0.77744522 0.9347052 0.21214252 0.6516738 0.1255551
## equivalent to: Mat2Sym(A + 0, FALSE, 128)
new("dsyMatrix", x = base::c(A), Dim = dim(A), uplo = "L")
#5 x 5 Matrix of class "dsyMatrix"
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0.8966972 0.2655087 0.3721239 0.5728534 0.9082078
#[2,] 0.2655087 0.8983897 0.9446753 0.6607978 0.6291140
#[3,] 0.3721239 0.9446753 0.1765568 0.6870228 0.3841037
#[4,] 0.5728534 0.6607978 0.6870228 0.9919061 0.3800352
#[5,] 0.9082078 0.6291140 0.3841037 0.3800352 0.1255551
Matrix method is sub-optimal for three reasons:
x slot as a numeric vector, so we have to do base::c(A) which essentially creates a copy of the matrix in RAM;Here is a quick comparison:
library(bench)
A <- matrix(runif(5000 * 5000), 5000)
bench::mark("Mat2Sym" = Mat2Sym(A, FALSE, 128),
"Matrix" = new("dsyMatrix", x = base::c(A), Dim = dim(A), uplo = "L"),
check = FALSE)
# expression min mean median max `itr/sec` mem_alloc n_gc n_itr
# <chr> <bch:tm> <bch:tm> <bch:tm> <bch:t> <dbl> <bch:byt> <dbl> <int>
#1 Mat2Sym 56.8ms 57.7ms 57.4ms 59.4ms 17.3 2.48KB 0 9
#2 Matrix 334.3ms 337.4ms 337.4ms 340.6ms 2.96 190.74MB 2 2
Note how fast Mat2Sym is. Also no memory allocation is made in "overwriting" mode.
As G. Grothendieck mentions, we can also use "dspMatrix".
set.seed(0)
A <- matrix(runif(25), 5)
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0.8966972 0.2016819 0.06178627 0.7698414 0.7774452
#[2,] 0.2655087 0.8983897 0.20597457 0.4976992 0.9347052
#[3,] 0.3721239 0.9446753 0.17655675 0.7176185 0.2121425
#[4,] 0.5728534 0.6607978 0.68702285 0.9919061 0.6516738
#[5,] 0.9082078 0.6291140 0.38410372 0.3800352 0.1255551
## equivalent to: Mat2Sym(A + 0, TRUE, 128)
new("dspMatrix", x = A[upper.tri(A, TRUE)], Dim = dim(A), uplo = "U")
#5 x 5 Matrix of class "dspMatrix"
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0.89669720 0.2016819 0.06178627 0.7698414 0.7774452
#[2,] 0.20168193 0.8983897 0.20597457 0.4976992 0.9347052
#[3,] 0.06178627 0.2059746 0.17655675 0.7176185 0.2121425
#[4,] 0.76984142 0.4976992 0.71761851 0.9919061 0.6516738
#[5,] 0.77744522 0.9347052 0.21214252 0.6516738 0.1255551
## equivalent to: Mat2Sym(A + 0, FALSE, 128)
new("dspMatrix", x = A[lower.tri(A, TRUE)], Dim = dim(A), uplo = "L")
#5 x 5 Matrix of class "dspMatrix"
# [,1] [,2] [,3] [,4] [,5]
#[1,] 0.8966972 0.2655087 0.3721239 0.5728534 0.9082078
#[2,] 0.2655087 0.8983897 0.9446753 0.6607978 0.6291140
#[3,] 0.3721239 0.9446753 0.1765568 0.6870228 0.3841037
#[4,] 0.5728534 0.6607978 0.6870228 0.9919061 0.3800352
#[5,] 0.9082078 0.6291140 0.3841037 0.3800352 0.1255551
Again, Matrix is method is sub-optimal, due to the use of upper.tri or lower.tri.
library(bench)
A <- matrix(runif(5000 * 5000), 5000)
bench::mark("Mat2Sym" = Mat2Sym(A, FALSE, 128),
"Matrix" = new("dspMatrix", x = A[lower.tri(A, TRUE)], Dim = dim(A),
uplo = "L"),
check = FALSE)
# expression min mean median max `itr/sec` mem_alloc n_gc n_itr
# <chr> <bch:tm> <bch:tm> <bch:tm> <bch:t> <dbl> <bch:byt> <dbl> <int>
#1 Mat2Sym 56.5ms 57.9ms 58ms 58.7ms 17.3 2.48KB 0 9
#2 Matrix 934.7ms 934.7ms 935ms 934.7ms 1.07 524.55MB 2 1
In particular, we see that using "dspMatrix" is even less efficient that using "dsyMatrix".
Before making possibly premature optimizations with C/C++ check whether a dense matrix
A + t(A)
is sufficient (assuming A's only nonzero elements are below the diagonal or above it.
Also, if memory is the problem then the Matrix package has the packed symmetric class dspMatrix which can be created like this:
library(Matrix)
A <- matrix(c(0, 2, 3, 0, 0, 4, 0, 0, 0), 3) # dense lower triangular test input
dspA <- as(A + t(A), "dspMatrix")
giving:
> str(dspA)
Formal class 'dspMatrix' [package "Matrix"] with 5 slots
..@ x : num [1:6] 0 2 0 3 4 0 <- only 6 elements stored, not 9
..@ Dim : int [1:2] 3 3
..@ Dimnames:List of 2
.. ..$ : NULL
.. ..$ : NULL
..@ uplo : chr "U"
..@ factors : list()
or one can create it directly from the upper triangular part:
# use upper triangular part since we already created dspA that way
tA <- t(A)
dspA2 <- new("dspMatrix", Dim = as.integer(c(3,3)),
x = tA[upper.tri(tA, diag = TRUE)])
identical(dspA, dspA2)
## [1] TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With