Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Randomly sample contiguous rows from a data frame or matrix

I want to sample a number of contiguous rows from a data frame df.

df <- data.frame(C1 = c(1, 2, 4, 7, 9), C2 = c(2, 4, 6, 8, 10))

I am trying to get something similar to the following which allows me to sample 3 random rows and repeat the process 100 times.

test <- replicate(100, df[sample(1:nrow(df), 3, replace=T),], simplify=F)

By contiguous the result should be something like:

 [[1]]  
           C1 C2
   2       2  4
   3       4  6
   4       7  8

 [[2]]
           C1 C2
   1       1  2
   2       2  4
   3       4  6

   .
   .
   .

How could I achieve this?

like image 275
Eric González Avatar asked Jul 08 '18 15:07

Eric González


1 Answers

We just need to sample the starting row index for a chunk.

sample.block <- function (DF, chunk.size) {
  if (chunk.size > nrow(DF)) return(NULL)
  start <- sample.int(nrow(DF) - chunk.size + 1, 1)
  DF[start:(start + chunk.size - 1), ]
  }

replicate(100, sample.block(df, 3), simplify = FALSE)
like image 58
Zheyuan Li Avatar answered Oct 12 '22 20:10

Zheyuan Li