I need to calculate the mean of each off-diagonal element in an n × n matrix. The lower and upper triangles are redundant. Here's the code I'm currently using: <pre class="prettyprint"><code>A <- replicate(500, rnorm(500)) sapply(1:(nrow(A)-1), function(x) mean(A[row(A) == (col(A) - x)])) </code></pre> Which seems to work but does not scale well with larger matrices. The ones I have aren't huge, around 2-5000^2, but even with 1000^2 it's taking longer than I'd like: <pre class="prettyprint"><code>A <- replicate(1000, rnorm(1000)) system.time(sapply(1:(nrow(A)-1), function(x) mean(A[row(A) == (col(A) - x)]))) > user system elapsed > 26.662 4.846 31.494 </code></pre> Is there a smarter way of doing this? edit To clarify, I'd like the mean of each diagonal independently, e.g. for: <pre class="prettyprint"><code> 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 </code></pre> I would like: <pre class="prettyprint"><code> mean(c(1,2,3)) mean(c(1,2)) mean(1) </code></pre>

You can get significantly faster just by extracting the diagonals directly using linear addressing: <code>superdiag</code> here extracts the ith superdiagonal from A (i=1 is the principal diagonal) <pre class="prettyprint"><code>superdiag <- function(A,i) { n<-nrow(A); len<-n-i+1; r <- 1:len; c <- i:n; indices<-(c-1)*n+r; A[indices] } superdiagmeans <- function(A) { sapply(2:nrow(A), function(i){mean(superdiag(A,i))}) } </code></pre> Running this on a 1K square matrix gives a ~800x speedup: <pre class="prettyprint"><code>> A <- replicate(1000, rnorm(1000)) > system.time(sapply(1:(nrow(A)-1), function(x) mean(A[row(A) == (col(A) - x)]))) user system elapsed 26.464 3.345 29.793 > system.time(superdiagmeans(A)) user system elapsed 0.033 0.006 0.039 </code></pre> This gives you results in the same order as the original.

Faster way of calculating off-diagonal averages in large matrices

Tags:

r

matrix

average

I need to calculate the mean of each off-diagonal element in an n × n matrix. The lower and upper triangles are redundant. Here's the code I'm currently using:

A <- replicate(500, rnorm(500))
sapply(1:(nrow(A)-1), function(x) mean(A[row(A) == (col(A) - x)]))

Which seems to work but does not scale well with larger matrices. The ones I have aren't huge, around 2-5000^2, but even with 1000^2 it's taking longer than I'd like:

A <- replicate(1000, rnorm(1000)) 
system.time(sapply(1:(nrow(A)-1), function(x) mean(A[row(A) == (col(A) - x)])))
>   user  system elapsed 
> 26.662   4.846  31.494

Is there a smarter way of doing this?

edit To clarify, I'd like the mean of each diagonal independently, e.g. for:

I would like:

 mean(c(1,2,3))
 mean(c(1,2))
 mean(1)

756

asked Dec 17 '12 13:12

blmoore

2 Answers

You can get significantly faster just by extracting the diagonals directly using linear addressing: superdiag here extracts the ith superdiagonal from A (i=1 is the principal diagonal)

superdiag <- function(A,i) {
  n<-nrow(A); 
  len<-n-i+1;
  r <- 1:len; 
  c <- i:n; 
  indices<-(c-1)*n+r; 
  A[indices]
}

superdiagmeans <- function(A) {
  sapply(2:nrow(A), function(i){mean(superdiag(A,i))})
}

Running this on a 1K square matrix gives a ~800x speedup:

> A <- replicate(1000, rnorm(1000))

> system.time(sapply(1:(nrow(A)-1), function(x) mean(A[row(A) == (col(A) - x)])))
   user  system elapsed 
 26.464   3.345  29.793 

> system.time(superdiagmeans(A))
   user  system elapsed 
  0.033   0.006   0.039

This gives you results in the same order as the original.

195

answered Nov 15 '22 21:11

Jonathan Dursi

You can use the following function :

diagmean <- function(x){
  id <- row(x) - col(x)
  sol <- tapply(x,id,mean)
  sol[names(sol)!='0']
}

If we check this on your matrix, the speed gain is substantial:

> system.time(diagmean(A))
   user  system elapsed 
   2.58    0.00    2.58 

> system.time(sapply(1:(nrow(A)-1), function(x) mean(A[row(A) == (col(A) - x)])))
   user  system elapsed 
  38.93    4.01   42.98

Note that this function calculates both upper and lower triangles. You can calculate eg only the lower triangular using:

diagmean <- function(A){
  id <- row(A) - col(A)
  id[id>=0] <- NA
  tapply(A,id,mean)
}

This results in another speed gain. Note that the solution will be reversed compared to yours :

> A <- matrix(rep(c(1,2,3,4),4),ncol=4)

> sapply(1:(nrow(A)-1), function(x) mean(A[row(A) == (col(A) - x)]))
[1] 2.0 1.5 1.0

> diagmean(A)
 -3  -2  -1 
1.0 1.5 2.0

answered Nov 15 '22 21:11

Joris Meys

Related questions
                            
                                Return FALSE for duplicated NA values when using the function duplicated()
                            
                                Stacked histogram from already summarized counts using ggplot2
                            
                                Greek and alpha numeric in ggplot2 axis labels
                            
                                Visualise distances between texts
                            
                                Spacing between boxplots in ggplot2
                            
                                R rename duplicate col and rownames (subindexing)
                            
                                R rename an object / data.frame without intermediary object
                            
                                Calculate row-wise maximum
                            
                                How to convert UTM coordinates to lat and long in R
                            
                                R lapply different function to each element of list
                            
                                Shiny: How to change a background colour of a column?
                            
                                Extract the number of sheets from an Excel workbook in R (without XLConnect)
                            
                                R dplyr join by range or virtual column
                            
                                Cumulative Count Paste
                            
                                Map zip codes to their respective city and state in R?
                            
                                Multiple paired t-tests on multiple variables simultaneously using dplyr/tidyverse
                            
                                R: access field values
                            
                                SVD for sparse matrix in R
                            
                                R ggplot: geom_tile lines in pdf output
                            
                                Remove special characters from data frame

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With