Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient row wise matrix operation in R

I have 2 matrices M1, M2. For each row in M1, I want to find the maximum value of the product of that row in M1 and each row in M2.

I have tried the following implementation which produces the result I want.

set.seed(1)
st_time = Sys.time()
M1 = matrix(runif(1000*10), nrow=1000, ncol=10)
M2 = matrix(runif(10000*10), nrow=10000, ncol=10)

score = apply(M1, 1, function(x){
  w = M2 %*% diag(x)
  row_max = apply(w, 1, max)
  return(row_max)
})
required_output = t(score)
Sys.time() - st_time

This takes 16 seconds on my machine. Is there a faster implementation? Thanks!

like image 345
user124543131234523 Avatar asked Feb 18 '26 11:02

user124543131234523


2 Answers

Using a for loop gives quite a speed up for me

set.seed(1)
M1 = matrix(runif(1000*10), nrow=1000, ncol=10)
M2 = matrix(runif(10000*10), nrow=10000, ncol=10)

st_time = Sys.time()

tm = t(M2)
out = matrix(0, nr=nrow(M1), nc=nrow(M2))

for(i in 1:nrow(M1)){
  out[i, ] = matrixStats::colMaxs(M1[i, ]* tm)
}

Sys.time() - st_time
#Time difference of 1.835793 secs # was ~28secs with yours on my laptop


all.equal(required_output, out)
like image 195
user2957945 Avatar answered Feb 20 '26 01:02

user2957945


Running in parallel gives an easier speed. On my machine, the serial version is 15 seconds, the parallel version is just under 4 seconds.

Load the package

# Comes with R
library(parallel)

# Make the cluster 
# 8 cores, see detectCores() 
cl = makeCluster(8)

Then we need to explicitly export M2

clusterExport(cl, "M2")

and run as normal

score_par = function() {
  parApply(cl, M1, 1, function(x){
    w = M2 %*% diag(x)
    row_max = apply(w, 1, max)
    return(row_max)
  })
}
system.time(score_par())
like image 27
csgillespie Avatar answered Feb 19 '26 23:02

csgillespie



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!