Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Repeat rows of a data.frame N times

Tags:

dataframe

r

I have the following data frame:

data.frame(a = c(1,2,3),b = c(1,2,3))   a b 1 1 1 2 2 2 3 3 3 

I want to repeat the rows n times. For example, here the rows are repeated 3 times:

  a b 1 1 1 2 2 2 3 3 3 4 1 1 5 2 2 6 3 3 7 1 1 8 2 2 9 3 3 

Is there an easy function to do this in R? Thanks!

like image 581
Michael Avatar asked Jan 06 '12 04:01

Michael


People also ask

How do you repeat rows in a DataFrame?

repeat(3) will create a list where each index value will be repeated 3 times and df. iloc[df. index. repeat(3),:] will help generate a dataframe with the rows as exactly returned by this list.

How do you duplicate rows and times?

In the Copy and insert rows & columns dialog box, select Copy and insert rows option in the Type section, then select the data range you want to duplicate, and then specify the repeat time to duplicate the rows, see screenshot: 4.

How do you repeat a row multiple times in python?

In Python, if you want to repeat the elements multiple times in the NumPy array then you can use the numpy. repeat() function. In Python, this method is available in the NumPy module and this function is used to return the numpy array of the repeated items along with axis such as 0 and 1.

How do you repeat all rows in a DataFrame in R?

In R, the easiest way to repeat rows is with the REP() function. This function selects one or more observations from a data frame and creates one or more copies of them. Alternatively, you can use the SLICE() function from the dplyr package to repeat rows.


2 Answers

EDIT: updated to a better modern R answer.

You can use replicate(), then rbind the result back together. The rownames are automatically altered to run from 1:nrows.

d <- data.frame(a = c(1,2,3),b = c(1,2,3)) n <- 3 do.call("rbind", replicate(n, d, simplify = FALSE)) 

A more traditional way is to use indexing, but here the rowname altering is not quite so neat (but more informative):

 d[rep(seq_len(nrow(d)), n), ] 

Here are improvements on the above, the first two using purrr functional programming, idiomatic purrr:

purrr::map_dfr(seq_len(3), ~d) 

and less idiomatic purrr (identical result, though more awkward):

purrr::map_dfr(seq_len(3), function(x) d) 

and finally via indexing rather than list apply using dplyr:

d %>% slice(rep(row_number(), 3)) 
like image 147
mdsumner Avatar answered Sep 20 '22 17:09

mdsumner


For data.frame objects, this solution is several times faster than @mdsummer's and @wojciech-sobala's.

d[rep(seq_len(nrow(d)), n), ] 

For data.table objects, @mdsummer's is a bit faster than applying the above after converting to data.frame. For large n this might flip. microbenchmark.

Full code:

packages <- c("data.table", "ggplot2", "RUnit", "microbenchmark") lapply(packages, require, character.only=T)  Repeat1 <- function(d, n) {   return(do.call("rbind", replicate(n, d, simplify = FALSE))) }  Repeat2 <- function(d, n) {   return(Reduce(rbind, list(d)[rep(1L, times=n)])) }  Repeat3 <- function(d, n) {   if ("data.table" %in% class(d)) return(d[rep(seq_len(nrow(d)), n)])   return(d[rep(seq_len(nrow(d)), n), ]) }  Repeat3.dt.convert <- function(d, n) {   if ("data.table" %in% class(d)) d <- as.data.frame(d)   return(d[rep(seq_len(nrow(d)), n), ]) }  # Try with data.frames mtcars1 <- Repeat1(mtcars, 3) mtcars2 <- Repeat2(mtcars, 3) mtcars3 <- Repeat3(mtcars, 3)  checkEquals(mtcars1, mtcars2) #  Only difference is row.names having ".k" suffix instead of "k" from 1 & 2 checkEquals(mtcars1, mtcars3)  # Works with data.tables too mtcars.dt <- data.table(mtcars) mtcars.dt1 <- Repeat1(mtcars.dt, 3) mtcars.dt2 <- Repeat2(mtcars.dt, 3) mtcars.dt3 <- Repeat3(mtcars.dt, 3)  # No row.names mismatch since data.tables don't have row.names checkEquals(mtcars.dt1, mtcars.dt2) checkEquals(mtcars.dt1, mtcars.dt3)  # Time test res <- microbenchmark(Repeat1(mtcars, 10),                       Repeat2(mtcars, 10),                       Repeat3(mtcars, 10),                       Repeat1(mtcars.dt, 10),                       Repeat2(mtcars.dt, 10),                       Repeat3(mtcars.dt, 10),                       Repeat3.dt.convert(mtcars.dt, 10)) print(res) ggsave("repeat_microbenchmark.png", autoplot(res)) 
like image 21
Max Ghenis Avatar answered Sep 20 '22 17:09

Max Ghenis