Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently replicate matrices in R

I have a matrix and look for an efficient way to replicate it n times (where n is the number of observations in the dataset). For example, if I have a matrix A

A <- matrix(1:15, nrow=3)

then I want an output of the form

rbind(A, A, A, ...) #n times.

Obviously, there are many ways to construct such a large matrix, for example using a for loop or apply or similar functions. However, the call to the "matrix-replication-function" takes place in the very core of my optimization algorithm where it is called tens of thousands of times during one run of my program. Therefore, loops, apply-type of functions and anything similar to that are not efficient enough. (Such a solution would basically mean that a loop over n is performed tens of thousands of times, which is obviously inefficient.) I already tried to use the ordinary rep function, but haven't found a way to arrange the output of rep in a matrix of the desired format.

The solution do.call("rbind", replicate(n, A, simplify=F)) is also too inefficient because rbind is used too often in this case. (Then, about 30% of the total runtime of my program are spent performing the rbinds.)

Does anyone know a better solution?

like image 308
Wolfgang Pößnecker Avatar asked Oct 23 '12 16:10

Wolfgang Pößnecker


People also ask

How do you replicate a matrix in R?

The matrix can be created by using matrix function in R and if we want to create a matrix by replicating a vector then we just need to focus on the replication. For example, if we have a vector V and we want to create matrix by replicating V two times then the matrix can be created as matrix(replicate(2,V),nrow=2).

What does replicate () do in R?

replicate() function in R Programming Language is used to evaluate an expression N number of times repeatedly.

How do you duplicate a row of a matrix in R?

For example, if we have a matrix that contains only one row and three columns then the replication of that matrix three times will repeat that one row three times. This can be done by using rep function along with matrix function as shown in the below example.

How do I replicate a Dataframe in R?

The replicates of a data frame in R can be created with the help of sapply function, to set the number of times we want to repeat the data frame we can use rep.int,times argument.


2 Answers

Two more solutions:

The first is a modification of the example in the question

do.call("rbind", rep(list(A), n))

The second involves unrolling the matrix, replicating it, and reassembling it.

matrix(rep(t(A),n), ncol=ncol(A), byrow=TRUE)

Since efficiency is what was requested, benchmarking is necessary

library("rbenchmark")
A <- matrix(1:15, nrow=3)
n <- 10

benchmark(rbind(A, A, A, A, A, A, A, A, A, A),
          do.call("rbind", replicate(n, A, simplify=FALSE)),
          do.call("rbind", rep(list(A), n)),
          apply(A, 2, rep, n),
          matrix(rep(t(A),n), ncol=ncol(A), byrow=TRUE),
          order="relative", replications=100000)

which gives:

                                                 test replications elapsed
1                 rbind(A, A, A, A, A, A, A, A, A, A)       100000    0.91
3                   do.call("rbind", rep(list(A), n))       100000    1.42
5  matrix(rep(t(A), n), ncol = ncol(A), byrow = TRUE)       100000    2.20
2 do.call("rbind", replicate(n, A, simplify = FALSE))       100000    3.03
4                                 apply(A, 2, rep, n)       100000    7.75
  relative user.self sys.self user.child sys.child
1    1.000      0.91        0         NA        NA
3    1.560      1.42        0         NA        NA
5    2.418      2.19        0         NA        NA
2    3.330      3.03        0         NA        NA
4    8.516      7.73        0         NA        NA

So the fastest is the raw rbind call, but that assumes n is fixed and known ahead of time. If n is not fixed, then the fastest is do.call("rbind", rep(list(A), n). These were for a 3x5 matrix and 10 replications. Different sized matrices might give different orderings.

EDIT:

For n=600, the results are in a different order (leaving out the explicit rbind version):

A <- matrix(1:15, nrow=3)
n <- 600

benchmark(do.call("rbind", replicate(n, A, simplify=FALSE)),
          do.call("rbind", rep(list(A), n)),
          apply(A, 2, rep, n),
          matrix(rep(t(A),n), ncol=ncol(A), byrow=TRUE),
          order="relative", replications=10000)

giving

                                                 test replications elapsed
4  matrix(rep(t(A), n), ncol = ncol(A), byrow = TRUE)        10000    1.74
3                                 apply(A, 2, rep, n)        10000    2.57
2                   do.call("rbind", rep(list(A), n))        10000    2.79
1 do.call("rbind", replicate(n, A, simplify = FALSE))        10000    6.68
  relative user.self sys.self user.child sys.child
4    1.000      1.75        0         NA        NA
3    1.477      2.54        0         NA        NA
2    1.603      2.79        0         NA        NA
1    3.839      6.65        0         NA        NA

If you include the explicit rbind version, it is slightly faster than the do.call("rbind", rep(list(A), n)) version, but not by much, and slower than either the apply or matrix versions. So a generalization to arbitrary n does not require a loss of speed in this case.

like image 151
Brian Diggs Avatar answered Sep 21 '22 15:09

Brian Diggs


Probably this is more efficient:

apply(A, 2, rep, n)
like image 31
Sven Hohenstein Avatar answered Sep 22 '22 15:09

Sven Hohenstein