I need to do row-wise operations more than 15 million times, but have too slow code. Here is a small reproducible example:
costMatrix1 <- rbind(c(4.2,3.6,2.1,2.3),c(9.6,5.5,7.2,4.9),c(2.6,8.2,6.4,8.3),c(4.8,3.3,6.8,5.7))
costMatrix2 <- costMatrix1 #Example, the costMatrix2 is actually different from costMatrix1
tbl_Filter <- rbind(c(0,0,0,4),c(1,2,3,4),c(1,0,3,0),c(1,2,0,0),c(1,2,0,4))
tbl_Sums <- data.frame(matrix(0, nrow=10, ncol=2))
colnames(tbl_Sums) <- c("Sum1","Sum2")
for (i in 1:nrow(tbl_Filter))
{
tbl_Sums[i,1] <- sum(costMatrix1[tbl_Filter[i,],tbl_Filter[i,]])
tbl_Sums[i,2] <- sum(costMatrix2[tbl_Filter[i,],tbl_Filter[i,]])
}
I think to replace the for-loop with ddply is the solution, but I can't get it to work.
If you have very large arrays to work with, you are probably better off sticking to base R.
Here is how you could use sapply
to solve the summing problem for a single matrix. Then use it repeatedly on each input matrix:
sumOne <- function(cost, filter){
sapply(1:nrow(filter), function(i)sum(cost[filter[i,], filter[i,]]))
}
cbind(
sumOne(costMatrix1, tbl_Filter),
sumOne(costMatrix2, tbl_Filter)
)
The results:
[,1] [,2]
[1,] 5.7 11.4
[2,] 85.5 171.0
[3,] 15.3 30.6
[4,] 22.9 45.8
[5,] 43.9 87.8
This should be much, much faster than your loop. Not because of the fact that a for
loop is intrinsically slower than sapply (it's not), but because sapply
automatically reserves memory for the result, combined with the fact that [<-
is slow.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With