I need to calculate the mean of each off-diagonal element in an n × n matrix. The lower and upper triangles are redundant. Here's the code I'm currently using:
A <- replicate(500, rnorm(500))
sapply(1:(nrow(A)-1), function(x) mean(A[row(A) == (col(A) - x)]))
Which seems to work but does not scale well with larger matrices. The ones I have aren't huge, around 2-5000^2, but even with 1000^2 it's taking longer than I'd like:
A <- replicate(1000, rnorm(1000))
system.time(sapply(1:(nrow(A)-1), function(x) mean(A[row(A) == (col(A) - x)])))
> user system elapsed
> 26.662 4.846 31.494
Is there a smarter way of doing this?
edit To clarify, I'd like the mean of each diagonal independently, e.g. for:
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
I would like:
mean(c(1,2,3))
mean(c(1,2))
mean(1)
An element aij of a matrix A = [aij] is a diagonal elements of matrix if i = j, such as when rows and column suffixes are equal. Thus, a11 , a22 , a33, a44, … so on are diagonal elements of the matrix A = [aij]. The principal diagonal is also known as the leading diagonal.
In a table of numbers that has the same number of rows as columns, the entries that are not in the Main Diagonal are referred to as the off-diagonal entries in the table.
Description. b = trace( A ) calculates the sum of the diagonal elements of matrix A : tr ( A ) = ∑ i = 1 n a i i = a 11 + a 22 + ... + a n n .
Efficient approach: The idea to find the sum of values of principal diagonal is to iterate to N and use the value of matrix[row][row] for the summation of princliple diagonal and to find the sum of values of secondary diagonal is to use the value of matrix[row][N – (row + 1)] for summation.
You can get significantly faster just by extracting the diagonals directly using linear addressing: superdiag
here extracts the ith superdiagonal from A (i=1 is the principal diagonal)
superdiag <- function(A,i) {
n<-nrow(A);
len<-n-i+1;
r <- 1:len;
c <- i:n;
indices<-(c-1)*n+r;
A[indices]
}
superdiagmeans <- function(A) {
sapply(2:nrow(A), function(i){mean(superdiag(A,i))})
}
Running this on a 1K square matrix gives a ~800x speedup:
> A <- replicate(1000, rnorm(1000))
> system.time(sapply(1:(nrow(A)-1), function(x) mean(A[row(A) == (col(A) - x)])))
user system elapsed
26.464 3.345 29.793
> system.time(superdiagmeans(A))
user system elapsed
0.033 0.006 0.039
This gives you results in the same order as the original.
You can use the following function :
diagmean <- function(x){
id <- row(x) - col(x)
sol <- tapply(x,id,mean)
sol[names(sol)!='0']
}
If we check this on your matrix, the speed gain is substantial:
> system.time(diagmean(A))
user system elapsed
2.58 0.00 2.58
> system.time(sapply(1:(nrow(A)-1), function(x) mean(A[row(A) == (col(A) - x)])))
user system elapsed
38.93 4.01 42.98
Note that this function calculates both upper and lower triangles. You can calculate eg only the lower triangular using:
diagmean <- function(A){
id <- row(A) - col(A)
id[id>=0] <- NA
tapply(A,id,mean)
}
This results in another speed gain. Note that the solution will be reversed compared to yours :
> A <- matrix(rep(c(1,2,3,4),4),ncol=4)
> sapply(1:(nrow(A)-1), function(x) mean(A[row(A) == (col(A) - x)]))
[1] 2.0 1.5 1.0
> diagmean(A)
-3 -2 -1
1.0 1.5 2.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With