How can I split a data frame in R randomly?

Tags:

I have a data frame with ca. 1000 rows, and I want to split it randomly into 8 smaller dataframes each containing 100 element. I tried to used the sample function 8 times on the data frame, but sometimes it selects the same rows.

747

asked Apr 16 '16 10:04

Lanza

1 Answers

We create a grouping variable by sampleing 1 to 8 with size as the number of rows of the dataset, split the sequence of rows with the grouping variable in a list, loop through the list (lapply(...), subset the dataset and get the first 100 rows with head

lst <- lapply(split(1:nrow(df1), sample(1:8, nrow(df1), replace=TRUE, prob = rep(1/8, 8))),
           function(i) head(df1[i,],100))
sapply(lst, nrow)
#  1   2   3   4   5   6   7   8 
#100 100 100 100 100 100 100 100

As @RHertel mentioned in the comments, we can do a second sample to get the 100 rows

lst <- lapply(split(1:nrow(df1), sample(1:8, nrow(df1), replace=TRUE, prob = rep(1/8, 8))),
       function(i) df1[sample(i, 100, replace=FALSE),])

data

set.seed(24)
df1 <- data.frame(V1= 1:1000, V2= rnorm(1000))

178

answered Oct 18 '22 03:10

akrun

Related questions
                            
                                R: layout() affects margin size in plot regions
                            
                                Can I let Shiny wait for a longer time for numericInput before updating?
                            
                                Find Function Arguments without Defaults
                            
                                Changing height of strip text background in ggplot2 does not work as expected
                            
                                knitr -pandoc-citeproc error when compiling pdf output
                            
                                How can I parallelize combn()?
                            
                                R CMD check fails with "undefined exports"
                            
                                Dynamic number of sliders in Shiny
                            
                                Replacing punctuation except intra-word dashes with a space
                            
                                Emoji in R [UTF-8 encoding]
                            
                                quantile vs ecdf results
                            
                                custom split rule with partykit
                            
                                R RecordLinkage Identity
                            
                                Bubble sort using R language?
                            
                                dplyr override all but the first occurrences of a value within a group
                            
                                conflict between overlay and ifelse functions in r-raster
                            
                                alternative to `str()` in R
                            
                                R ggplotly: legend is not correctly displayed
                            
                                R data.table: using fread on all .csv files in folder skipping the last line of each
                            
                                multiple independent R sessions in Mac OS X

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With