I have a data frame with ca. 1000 rows, and I want to split it randomly into 8 smaller dataframes each containing 100 element. I tried to used the sample
function 8 times on the data frame, but sometimes it selects the same rows.
We accomplish this by counting the rows and taking the appropriate fraction (80%) of the rows as our selected sample. Next, we use the sample function to select the appropriate rows as a vector of rows. The final part involves splitting out the data set into the two portions.
To split the above Dataframe we use the split() function. The syntax of split() function is: Syntax: split(x, f, drop = FALSE, …)
The cut() method in base R is used to first divide the range of the dataframe and then divide the values based on the intervals in which they fall. Each of the intervals corresponds to one level of the dataframe. Therefore, the number of levels is equivalent to the length of the breaks argument in the cut method.
A random split will split a cluster across sets, causing skew. A simple approach to fixing this problem would be to split our data based on when the story was published, perhaps by day the story was published. This results in stories from the same day being placed in the same split.
We create a grouping variable by sample
ing 1 to 8 with size
as the number of rows of the dataset, split
the sequence of rows with the grouping variable in a list
, loop through the list
(lapply(...
), subset the dataset and get the first 100 rows with head
lst <- lapply(split(1:nrow(df1), sample(1:8, nrow(df1), replace=TRUE, prob = rep(1/8, 8))),
function(i) head(df1[i,],100))
sapply(lst, nrow)
# 1 2 3 4 5 6 7 8
#100 100 100 100 100 100 100 100
As @RHertel mentioned in the comments, we can do a second sample
to get the 100 rows
lst <- lapply(split(1:nrow(df1), sample(1:8, nrow(df1), replace=TRUE, prob = rep(1/8, 8))),
function(i) df1[sample(i, 100, replace=FALSE),])
set.seed(24)
df1 <- data.frame(V1= 1:1000, V2= rnorm(1000))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With