Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Must speed up row-wise operations

Tags:

r

I need to do row-wise operations more than 15 million times, but have too slow code. Here is a small reproducible example:

costMatrix1 <- rbind(c(4.2,3.6,2.1,2.3),c(9.6,5.5,7.2,4.9),c(2.6,8.2,6.4,8.3),c(4.8,3.3,6.8,5.7))
costMatrix2 <- costMatrix1 #Example, the costMatrix2 is actually different from costMatrix1

tbl_Filter <- rbind(c(0,0,0,4),c(1,2,3,4),c(1,0,3,0),c(1,2,0,0),c(1,2,0,4))

tbl_Sums <- data.frame(matrix(0, nrow=10, ncol=2))
colnames(tbl_Sums) <- c("Sum1","Sum2")

for (i in 1:nrow(tbl_Filter))
{
  tbl_Sums[i,1] <- sum(costMatrix1[tbl_Filter[i,],tbl_Filter[i,]])
  tbl_Sums[i,2] <- sum(costMatrix2[tbl_Filter[i,],tbl_Filter[i,]])
}

I think to replace the for-loop with ddply is the solution, but I can't get it to work.

like image 734
Chris Avatar asked Dec 22 '22 01:12

Chris


1 Answers

If you have very large arrays to work with, you are probably better off sticking to base R.

Here is how you could use sapply to solve the summing problem for a single matrix. Then use it repeatedly on each input matrix:

sumOne <- function(cost, filter){
  sapply(1:nrow(filter), function(i)sum(cost[filter[i,], filter[i,]]))
}


cbind(
    sumOne(costMatrix1, tbl_Filter),
    sumOne(costMatrix2, tbl_Filter)
)

The results:

     [,1]  [,2]
[1,]  5.7  11.4
[2,] 85.5 171.0
[3,] 15.3  30.6
[4,] 22.9  45.8
[5,] 43.9  87.8

This should be much, much faster than your loop. Not because of the fact that a for loop is intrinsically slower than sapply (it's not), but because sapply automatically reserves memory for the result, combined with the fact that [<- is slow.

like image 143
Andrie Avatar answered Dec 24 '22 00:12

Andrie