I know improving for loop has been asked tons of times before. We can apply family functions to improve the for loop in R.
However is there a way to improve manipulations of a matrix where those manipulations depend on another matrix? What I mean here is this, where the elements I set to 2 in test
are based on another matrix index
:
for (i in 1:nrow(test)){
test[i,index[i,]] <- 2
} # where index is predetermined matrix
Another example is this, where I set the values in test
based on the ordering of elements in the rows of another matrix anyMatrix
:
for (i in 1:nrow(test)){
test[i,] <- order(anyMatrix[i,])
}
I could use lapply or sapply here but they return a list and it takes same amount of time to convert it back to matrix.
Reproducible example:
test <- matrix(0, nrow = 10, ncol = 10)
set.seed(1234)
index <- matrix(sample.int(10, 10*10, TRUE), 10, 10)
anyMatrix <- matrix(rnorm(10*10), nrow = 10, ncol = 10)
for (i in 1:nrow(test)){
test[i,index[i,]] <- 2
}
for (i in 1:nrow(test)){
test[i,] <- order(anyMatrix[i,])
}
You really appear to have two separate problems here.
Problem 1: Given a matrix index
, for each row i
and column j
you want to set test[i,j]
to 2 if j
appears in row i
of index
. This can be done with simple matrix indexing, passing a 2-column matrix of indices where the first column is the rows of all the elements you want to index and the second column is the columns of all the elements you want to index:
test[cbind(as.vector(row(index)), as.vector(index))] <- 2
test
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 2 2 0 2 2 2 2 0 2 2
# [2,] 2 0 2 2 2 2 2 0 2 2
# [3,] 2 2 2 2 0 0 2 2 0 0
# [4,] 2 2 0 0 0 2 2 2 0 2
# [5,] 2 2 2 2 0 0 0 0 2 0
# [6,] 0 0 0 0 0 2 2 2 2 0
# [7,] 2 0 2 2 2 2 2 0 0 0
# [8,] 2 0 2 2 2 2 0 2 0 2
# [9,] 2 2 2 2 0 0 2 0 2 2
# [10,] 2 0 2 0 0 2 2 2 2 0
Since this does all the operations in a single vectorized operation, it should be faster than looping through the rows and handling them individually. Here's an example with 1 million rows and 10 columns:
OP <- function(test, index) {
for (i in 1:nrow(test)){
test[i,index[i,]] <- 2
}
test
}
josliber <- function(test, index) {
test[cbind(as.vector(row(index)), as.vector(index))] <- 2
test
}
test.big <- matrix(0, nrow = 1000000, ncol = 10)
set.seed(1234)
index.big <- matrix(sample.int(10, 1000000*10, TRUE), 1000000, 10)
identical(OP(test.big, index.big), josliber(test.big, index.big))
# [1] TRUE
system.time(OP(test.big, index.big))
# user system elapsed
# 1.564 0.014 1.591
system.time(josliber(test.big, index.big))
# user system elapsed
# 0.408 0.034 0.444
Here, the vectorized approach is 3.5x faster.
Problem 2: You want to set row i
of test
to order
applied to the corresponding row of anyMatrix
. You can do this with apply
:
(test <- t(apply(anyMatrix, 1, order)))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 10 7 8 4 5 3 6 2 9
# [2,] 8 7 1 6 3 4 9 5 10 2
# [3,] 4 9 7 1 3 2 6 10 5 8
# [4,] 1 2 6 4 10 3 9 8 7 5
# [5,] 9 6 5 1 2 7 10 4 8 3
# [6,] 9 3 8 6 5 10 1 4 7 2
# [7,] 3 7 2 5 6 8 9 4 1 10
# [8,] 9 8 1 3 4 6 7 10 5 2
# [9,] 8 4 3 6 10 7 9 5 2 1
# [10,] 4 1 9 3 6 7 8 2 10 5
I wouldn't expect much of a change in runtime here, because apply
is really just looping through the rows similarly to how you were looping in your solution. Still, I would prefer this solution because it's a good deal less typing and the more "R" way of doing things.
Note that both of these applications used pretty different code, which is pretty typical in R data manipulation -- there are a lot of different specialized operators and you need to pick the one that's right for your task. I don't think there's a single function or even really a small set of functions that are going to be able to handle all matrix manipulations where that manipulation is based on data from another matrix.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With