Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently replicate matrix rows by group in R

I am trying to find a way to efficiently replicate rows of a matrix in R based on a group. Let's say I have the following matrix a:

a <- matrix(
  c(1, 2, 3,
    4, 5, 6,
    7, 8, 9),
  ncol = 3, byrow = TRUE
)

I want to create a new matrix where each row in a is repeated based on a number specified in a vector (what I'm calling a "group"), e.g.:

reps <- c(2, 3, 4)

In this case, the resulting matrix would be:

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    1    2    3
[3,]    4    5    6
[4,]    4    5    6
[5,]    4    5    6
[6,]    7    8    9
[7,]    7    8    9
[8,]    7    8    9
[9,]    7    8    9

This is the only solution I've come up with so far:

matrix(
  rep(a, times = rep(reps, times = 3)), 
  ncol = 3, byrow = FALSE
)

Notice that in this solution I have to use rep() twice - first to replicate the reps vector, and then again to actually replicate each row of a.

This solution works fine, but I'm looking for a more efficient solution as in my case this is being done inside an optimization loop and is being computed in each iteration of the loop, and it's rather slow if a is large.

I'll note that this question is very similar, but it is about repeating each row the same number of times. This question is also similarly about efficiency, but it's about replicating entire matrices.

UPDATE

Since I'm interested in efficiency, here is a simple comparison of the solutions provided thus far...I'll update this as more come in, but in general it looks like the seq_along solution by F. Privé is the fastest.

library(dplyr)
library(tidyr)

a <- matrix(seq(9), ncol = 3, byrow = TRUE)
reps <- c(2, 3, 4)

rbenchmark::benchmark(
  "original solution" = {
    result <- matrix(rep(a, times = rep(reps, times = 3)),
      ncol = 3, byrow = FALSE)
  },
  "seq_along" = {
    result <- a[rep(seq_along(reps), reps), ]
  },
  "uncount" = {
    result <- as.data.frame(a) %>%
      uncount(reps)
  },
    replications = 1000,
    columns = c("test", "replications", "elapsed", "relative")
)
               test replications elapsed relative
1 original solution         1000   0.004    1.333
2         seq_along         1000   0.003    1.000
3           uncount         1000   1.722  574.000
like image 697
jhelvy Avatar asked Mar 10 '26 12:03

jhelvy


1 Answers

Simply use a[rep(seq_along(reps), reps), ].

like image 85
F. Privé Avatar answered Mar 12 '26 03:03

F. Privé



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!