I have a matrix and look for an efficient way to replicate it n times (where n is the number of observations in the dataset). For example, if I have a matrix A
A <- matrix(1:15, nrow=3)
then I want an output of the form
rbind(A, A, A, ...) #n times
.
Obviously, there are many ways to construct such a large matrix, for example using a for
loop or apply
or similar functions. However, the call to the "matrix-replication-function" takes place in the very core of my optimization algorithm where it is called tens of thousands of times during one run of my program. Therefore, loops, apply-type of functions and anything similar to that are not efficient enough. (Such a solution would basically mean that a loop over n is performed tens of thousands of times, which is obviously inefficient.) I already tried to use the ordinary rep
function, but haven't found a way to arrange the output of rep
in a matrix of the desired format.
The solution
do.call("rbind", replicate(n, A, simplify=F))
is also too inefficient because rbind
is used too often in this case. (Then, about 30% of the total runtime of my program are spent performing the rbinds.)
Does anyone know a better solution?
The matrix can be created by using matrix function in R and if we want to create a matrix by replicating a vector then we just need to focus on the replication. For example, if we have a vector V and we want to create matrix by replicating V two times then the matrix can be created as matrix(replicate(2,V),nrow=2).
replicate() function in R Programming Language is used to evaluate an expression N number of times repeatedly.
For example, if we have a matrix that contains only one row and three columns then the replication of that matrix three times will repeat that one row three times. This can be done by using rep function along with matrix function as shown in the below example.
The replicates of a data frame in R can be created with the help of sapply function, to set the number of times we want to repeat the data frame we can use rep.int,times argument.
Two more solutions:
The first is a modification of the example in the question
do.call("rbind", rep(list(A), n))
The second involves unrolling the matrix, replicating it, and reassembling it.
matrix(rep(t(A),n), ncol=ncol(A), byrow=TRUE)
Since efficiency is what was requested, benchmarking is necessary
library("rbenchmark")
A <- matrix(1:15, nrow=3)
n <- 10
benchmark(rbind(A, A, A, A, A, A, A, A, A, A),
do.call("rbind", replicate(n, A, simplify=FALSE)),
do.call("rbind", rep(list(A), n)),
apply(A, 2, rep, n),
matrix(rep(t(A),n), ncol=ncol(A), byrow=TRUE),
order="relative", replications=100000)
which gives:
test replications elapsed
1 rbind(A, A, A, A, A, A, A, A, A, A) 100000 0.91
3 do.call("rbind", rep(list(A), n)) 100000 1.42
5 matrix(rep(t(A), n), ncol = ncol(A), byrow = TRUE) 100000 2.20
2 do.call("rbind", replicate(n, A, simplify = FALSE)) 100000 3.03
4 apply(A, 2, rep, n) 100000 7.75
relative user.self sys.self user.child sys.child
1 1.000 0.91 0 NA NA
3 1.560 1.42 0 NA NA
5 2.418 2.19 0 NA NA
2 3.330 3.03 0 NA NA
4 8.516 7.73 0 NA NA
So the fastest is the raw rbind
call, but that assumes n
is fixed and known ahead of time. If n
is not fixed, then the fastest is do.call("rbind", rep(list(A), n)
. These were for a 3x5 matrix and 10 replications. Different sized matrices might give different orderings.
EDIT:
For n=600, the results are in a different order (leaving out the explicit rbind
version):
A <- matrix(1:15, nrow=3)
n <- 600
benchmark(do.call("rbind", replicate(n, A, simplify=FALSE)),
do.call("rbind", rep(list(A), n)),
apply(A, 2, rep, n),
matrix(rep(t(A),n), ncol=ncol(A), byrow=TRUE),
order="relative", replications=10000)
giving
test replications elapsed
4 matrix(rep(t(A), n), ncol = ncol(A), byrow = TRUE) 10000 1.74
3 apply(A, 2, rep, n) 10000 2.57
2 do.call("rbind", rep(list(A), n)) 10000 2.79
1 do.call("rbind", replicate(n, A, simplify = FALSE)) 10000 6.68
relative user.self sys.self user.child sys.child
4 1.000 1.75 0 NA NA
3 1.477 2.54 0 NA NA
2 1.603 2.79 0 NA NA
1 3.839 6.65 0 NA NA
If you include the explicit rbind
version, it is slightly faster than the do.call("rbind", rep(list(A), n))
version, but not by much, and slower than either the apply
or matrix
versions. So a generalization to arbitrary n
does not require a loss of speed in this case.
Probably this is more efficient:
apply(A, 2, rep, n)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With