Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Faster matrix multiplication by replacing a double loop

I have a dataframe which looks a bit as produced by the following code (but much larger)

set.seed(10)    
mat <- matrix(rbinom(200, size=1, prob = .5), ncol = 10)

In the columns are issues and 1 indicates that an observation is interested in a specific issue. I want to generate a network comparing all observations and have a count of issues that each dyad is jointly interested in.

I have produced the following code, which seems to be working fine:

mat2 <- matrix(NA,20,20)

for(i in 1:nrow(mat)){
    for(j in 1:nrow(mat)){
       mat2[i,j] <- sum(as.numeric(mat[i,]==1) + as.numeric(mat[j,]==1) == 2)
    }
 }

So I compare every entry with every other entry, and only if both have a 1 entry (i.e., they are interested), then this sums to 2 and will be counted as joint interest in a topic.

My problem is that my dataset is very large, and the loop now runs for hours already.

Does anyone have an idea how to do this while avoiding the loop?

like image 754
Flow Avatar asked Oct 30 '19 10:10

Flow


2 Answers

going ahead and promoting @jogo's comment as it is by far the fastest (thank's for the hint, I will use that in production as well).

set.seed(10)    
mat <- matrix(rbinom(200, size=1, prob = .5), ncol = 10)
mat2 <- matrix(NA,20,20)
binary_mat <- mat == 1
tmat <- t(mat==1)

microbenchmark::microbenchmark(
  "loop" = for(i in 1:nrow(mat)){
             for(j in 1:nrow(mat)){
               mat2[i,j] <- sum(as.numeric(mat[i,]==1) + as.numeric(mat[j,]==1) == 2)
             }
           }, 
  "apply" = mat4 <- apply(tmat, 2, function(x) colSums(tmat & x)), 
  "matrix multiplication" = mat5 <- mat %*% t(mat),
  "tcrossprod" = tcrossprod(mat),
  "tcrossprod binary" = tcrossprod(binary_mat)
)

On my machine this benchmark results in

Unit: microseconds
                  expr       min        lq        mean    median         uq       max neval cld
                  loop 16699.634 16972.271 17931.82535 17180.397 17546.1545 31502.706   100   b
                 apply   322.942   330.046   395.69045   357.886   368.8300  4299.228   100  a 
 matrix multiplication    21.889    28.801    36.76869    39.360    43.9685    50.689   100  a 
            tcrossprod     7.297     8.449    11.20218     9.984    14.4005    18.433   100  a 
     tcrossprod binary     7.680     8.833    11.08316     9.601    12.0970    35.713   100  a 
like image 36
Tom Avatar answered Sep 20 '22 07:09

Tom


This should be faster:

tmat <- t(mat==1)
mat4 <- apply(tmat, 2, function(x) colSums(tmat & x))
like image 102
jogo Avatar answered Sep 19 '22 07:09

jogo