I have the following matrix/ dataframe:
> e
V1 V2 V3 V4 V5
1 0 2 3 4 5
2 0 0 6 8 10
3 0 0 0 12 15
4 0 0 0 0 20
5 0 0 0 0 0
In this case N=5 (number of rows = number of columns). I would like to fill in the missing values in this symmetric matrix (e[1,2]=e[2,1] etc.). Is there a most efficient way to fill in the missing values (N the matrix size in my case quite big)? Is there a better way than nested loops?
An efficient data structure for storing dense, symmetric matrices is a simple linear array.
# AlgorithmInput a matrix of size nxn from the user with the help of nested for loops and store it in a 2-D array of size nxn. Set a variable ctr to 1. Using a nested for loop check if a[i][j] is equal to a[j][i]. if not, set ctr to zero and break out of the loops.
A matrix is symmetric if and only if it is equal to its transpose. All entries above the main diagonal of a symmetric matrix are reflected into equal entries below the diagonal. A matrix is skew-symmetric if and only if it is the opposite of its transpose. All main diagonal entries of a skew-symmetric matrix are zero.
Just for completion I would like to also show this technique. The addition of the transpose will not work if the lower part of the matrix (under the diagonal) has values filled in, as it will add them to the upper part of the matrix.
Using the Matrix package we can create a sparse Matrix, which in case of creating the symmetric of a big matrix would require much less memory and even speed it up.
In order to create a symmetric sparse matrix from matrix e
we would do:
library(Matrix)
rowscols <- which(upper.tri(e), arr.ind=TRUE)
sparseMatrix(i=rowscols[,1], #rows to fill in
j=rowscols[,2], #cols to fill in
x=e[upper.tri(e)], #values to use (i.e. the upper values of e)
symmetric=TRUE, #make it symmetric
dims=c(nrow(e),nrow(e))) #dimensions
Output:
5 x 5 sparse Matrix of class "dsCMatrix"
[1,] . 2 3 4 5
[2,] 2 . 6 8 10
[3,] 3 6 . 12 15
[4,] 4 8 12 . 20
[5,] 5 10 15 20 .
Microbenchmark:
Let's make a function to make a symmetric Matrix out of a matrix (copies the upper part of matrix to the lower by default):
symmetrise <- function(mat){
rowscols <- which(upper.tri(mat), arr.ind=TRUE)
sparseMatrix(i=rowscols[,1],
j=rowscols[,2],
x=mat[upper.tri(mat)],
symmetric=TRUE,
dims=c(nrow(mat),ncol(mat)) )
}
And test:
> microbenchmark(
e + t(e),
symmetrise(e),
e[lower.tri(e)] <- t(e)[lower.tri(e)],
times=1000
)
Unit: microseconds
expr min lq mean median uq max neval cld
e + t(e) 75.946 99.038 117.1984 110.841 134.9590 246.825 1000 a
symmetrise(e) 5530.212 6246.569 6950.7681 6921.873 7034.2525 48662.989 1000 c
e[lower.tri(e)] <- t(e)[lower.tri(e)] 261.193 322.771 430.4479 349.968 395.3815 36873.894 1000 b
As you can see symmetrise
is actually much slower than e + t(e)
or df[lower.tri(df)] <- t(df)[lower.tri(df)]
but at least you have a function that automatically symmetrises a matrix (takes the upper part and creates the lower by default) and in case you have a big matrix where memory is an issue, this might come in handy.
P.S. Wherever you see a .
in the Matrix this represents a zero. By using a different system a sparse Matrix is a kind of 'compressed' object making it more memory efficient.
Also for speed:
2*symmpart(as.matrix(e))
Here's a benchmark:
Unit: microseconds
expr min lq mean median uq max neval
e + t(e) 572.505 597.194 655.132028 611.5420 628.4860 8424.902 1000
symmetrise(e) 1128.220 1154.562 1215.740071 1167.0020 1185.6585 10656.059 1000
e[lower.tri(e)] <- e[upper.tri(e, FALSE)] 285.013 311.191 350.846885 327.1335 339.5910 8106.006 1000
2 * symmpart(as.matrix(e)) 78.392 93.953 101.330522 102.1860 107.9215 153.628 1000
It can get this speed because it creates a symmetric matrix directly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With