Given a dataframe df
with a column called group
, how do you randomly sample k
groups from it in dplyr? It should return all rows from k
groups (given there are at least k
unique values in df$group
), and every group in df
should be equally likely to be returned.
An example of a simple random sample would be the names of 25 employees being chosen out of a hat from a company of 250 employees. In this case, the population is all 250 employees, and the sample is random because each employee has an equal chance of being chosen.
The sampling units may be individuals or they may be in groups. For example, in a particular study involving animals, one can select individual animals or groups of animals like in herds, farms, or administrative regions.
There are four primary, random (probability) sampling methods – simple random sampling, systematic sampling, stratified sampling, and cluster sampling.
Definition: Random sampling is a part of the sampling technique in which each sample has an equal probability of being chosen. A sample chosen randomly is meant to be an unbiased representation of the total population.
Just use sample()
to choose some number of groups
iris %>% filter(Species %in% sample(levels(Species),2))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With