R: How to split data into training and testing set, while preserving proportions & distributions of variables?

Question

Reproducible example:

library(caTools) #for sample.split function
set.seed(123)
#Creating example data frame
example_df <- data.frame(personID = > c(stringi::stri_rand_strings(1000, 5)),
                           sex = sample(1:2, 1000, replace=TRUE),
                           age = round(rnorm(1000, mean=50, sd=15), 0))

#Example of random splitting:
training_set <- example_df[sample.split(example_df$personID),]
test_set <- example_df[-c(training_set$personID),]

#evaluation of variables in test and training data sets:
  #Has to approximate 1 (in this case it's 1.2, which is too high)
  (sum(training_set$sex == 1) / sum(training_set$sex == 2)) / (sum(test_set$sex == 1) / sum(test_set$sex == 2)) 
  [1] 1.219139
  #Has to approximate 1 along the distribution (it's quite good, this is actually what i would expect)
  summary(training_set$age) / summary(test_set$age)
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.7143  0.9756  1.0000  1.0032  1.0169  1.0000

Although sample.split function divided age appropriately (distributions match), proportion of males and females differ significantly in sex variable. What function to use for automatic and even split of data into multiple (in this example two) sets, while preserving proportions and distributions of variables?

itsMeInMiami · Accepted Answer

The caret package will build balanced sets for you. Check the package vignette covering the basics. For example:

inTrain <- createDataPartition(
  y = Sonar$Class,
  ## the outcome data are needed
  p = .75,
  ## The percentage of data in the
  ## training set
  list = FALSE
)

R: How to split data into training and testing set, while preserving proportions & distributions of variables?

Tags:

r

testing

juststuck

1 Answers

itsMeInMiami

Recent Activity

Donate For Us

R: How to split data into training and testing set, while preserving proportions & distributions of variables?

Tags:

r

testing

juststuck

1 Answers

itsMeInMiami

Related questions

Recent Activity

Donate For Us