I have the following data frame:
data.frame(a = c(1,2,3),b = c(1,2,3)) a b 1 1 1 2 2 2 3 3 3
I want to repeat the rows n times. For example, here the rows are repeated 3 times:
a b 1 1 1 2 2 2 3 3 3 4 1 1 5 2 2 6 3 3 7 1 1 8 2 2 9 3 3
Is there an easy function to do this in R? Thanks!
repeat(3) will create a list where each index value will be repeated 3 times and df. iloc[df. index. repeat(3),:] will help generate a dataframe with the rows as exactly returned by this list.
In the Copy and insert rows & columns dialog box, select Copy and insert rows option in the Type section, then select the data range you want to duplicate, and then specify the repeat time to duplicate the rows, see screenshot: 4.
In Python, if you want to repeat the elements multiple times in the NumPy array then you can use the numpy. repeat() function. In Python, this method is available in the NumPy module and this function is used to return the numpy array of the repeated items along with axis such as 0 and 1.
In R, the easiest way to repeat rows is with the REP() function. This function selects one or more observations from a data frame and creates one or more copies of them. Alternatively, you can use the SLICE() function from the dplyr package to repeat rows.
EDIT: updated to a better modern R answer.
You can use replicate()
, then rbind
the result back together. The rownames are automatically altered to run from 1:nrows.
d <- data.frame(a = c(1,2,3),b = c(1,2,3)) n <- 3 do.call("rbind", replicate(n, d, simplify = FALSE))
A more traditional way is to use indexing, but here the rowname altering is not quite so neat (but more informative):
d[rep(seq_len(nrow(d)), n), ]
Here are improvements on the above, the first two using purrr
functional programming, idiomatic purrr:
purrr::map_dfr(seq_len(3), ~d)
and less idiomatic purrr (identical result, though more awkward):
purrr::map_dfr(seq_len(3), function(x) d)
and finally via indexing rather than list apply using dplyr
:
d %>% slice(rep(row_number(), 3))
For data.frame
objects, this solution is several times faster than @mdsummer's and @wojciech-sobala's.
d[rep(seq_len(nrow(d)), n), ]
For data.table
objects, @mdsummer's is a bit faster than applying the above after converting to data.frame
. For large n this might flip. .
Full code:
packages <- c("data.table", "ggplot2", "RUnit", "microbenchmark") lapply(packages, require, character.only=T) Repeat1 <- function(d, n) { return(do.call("rbind", replicate(n, d, simplify = FALSE))) } Repeat2 <- function(d, n) { return(Reduce(rbind, list(d)[rep(1L, times=n)])) } Repeat3 <- function(d, n) { if ("data.table" %in% class(d)) return(d[rep(seq_len(nrow(d)), n)]) return(d[rep(seq_len(nrow(d)), n), ]) } Repeat3.dt.convert <- function(d, n) { if ("data.table" %in% class(d)) d <- as.data.frame(d) return(d[rep(seq_len(nrow(d)), n), ]) } # Try with data.frames mtcars1 <- Repeat1(mtcars, 3) mtcars2 <- Repeat2(mtcars, 3) mtcars3 <- Repeat3(mtcars, 3) checkEquals(mtcars1, mtcars2) # Only difference is row.names having ".k" suffix instead of "k" from 1 & 2 checkEquals(mtcars1, mtcars3) # Works with data.tables too mtcars.dt <- data.table(mtcars) mtcars.dt1 <- Repeat1(mtcars.dt, 3) mtcars.dt2 <- Repeat2(mtcars.dt, 3) mtcars.dt3 <- Repeat3(mtcars.dt, 3) # No row.names mismatch since data.tables don't have row.names checkEquals(mtcars.dt1, mtcars.dt2) checkEquals(mtcars.dt1, mtcars.dt3) # Time test res <- microbenchmark(Repeat1(mtcars, 10), Repeat2(mtcars, 10), Repeat3(mtcars, 10), Repeat1(mtcars.dt, 10), Repeat2(mtcars.dt, 10), Repeat3(mtcars.dt, 10), Repeat3.dt.convert(mtcars.dt, 10)) print(res) ggsave("repeat_microbenchmark.png", autoplot(res))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With