I'm programming in R. I've got a vector containing, let's say, 1000 values. Now let's say I want to partition these 1000 values randomly into two new sets, one containing 400 values and the other containing 600. How could I do this? I've thought about doing something like this...
firstset <- sample(mydata, size=400)
...but this doesn't partition the data (in other words, I still don't know which 600 values to put in the other set). I also thought about looping from 1 to 400, randomly removing 1 value at a time and placing it in firstset
. This would partition the data correctly, but how to implement this is not clear to me. Plus I've been told to avoid for
loops in R whenever possible.
Any ideas?
Instead of sampling the values, you could sample their positions.
positions <- sample(length(mydata), size=400) # ucfagls' suggestion
firstset <- mydata[positions]
secondset <- mydata[-positions]
EDIT: ucfagls' suggestion will be more efficient (especially for larger vectors), since it avoids allocating a vector of positions in R.
Just randomize mydata and take the first 400 and then last 600.
mydata <- sample(mydata)
firstset <- mydata[1:400]
secondset <- mydata[401:1000]
If mydata
is truly a vector, one option would be:
split(mydata, sample(c(rep("group1", 600), rep("group2", 400))))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With