I know how to do ordinary random sampling using R:
mysample <- mydata[sample(1:nrow(mydata), 100),]
However, I want to sample by id variables. Let me explain - my dataset looks like this:
id var1 var2 ...
1 5.1 1.2
1 4.7 0.9
2 3.3 1.6
3 3.4 5.7
4 7.9 1.3
Now, I want to take a random sample of, say, 2, by id numbers. Let's say the random sample yields id 1 and 4, then my sample would look like this:
id var1 var2 ...
1 5.1 1.2
1 4.7 0.9
4 7.9 1.3
In other words, I'm sampling 2 id numbers, but I'm actually getting 3 cases.
How can I accomplish this in R?
Your data:
mydata <- read.table(text = "id var1 var2
1 5.1 1.2
1 4.7 0.9
2 3.3 1.6
3 3.4 5.7
4 7.9 1.3", header = TRUE)
Sample two id
values:
set.seed(1)
ids <- sample(unique(mydata$id), 2) # important: the UNIQUE id numbers
# [1] 2 4
Extract subset:
mydata[mydata$id %in% ids, ]
# id var1 var2
# 3 2 3.3 1.6
# 5 4 7.9 1.3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With