Taking a disproportionate sample from a dataset in R

Tags:

If I have a large dataset in R, how can I take random sample of the data taking into consideration the distribution of the original data, particularly if the data are skewed and only 1% belong to a minor class and I want to take a biased sample of the data?

509

asked Apr 20 '12 05:04

simplyme

1 Answers

The sample(x, n, replace = FALSE, prob = NULL) function takes a sample from a vector x of size n. This sample can be with or without replacement, and the probabilities of selecting each element to the sample can be either the same for each element, or a vector informed by the user.

If you want to take a sample of same probabilities for each element with 50 cases, all you have to do is

n <- 50
smpl <- df[sample(nrow(df), 50),]

However, if you want to give different probabilities of being selected for the elements, let's say, elements that sex is M has probability 0.25, while those whose sex is F has prob 0.75, you should do

n <- 50
prb <- ifelse(sex=="M",0.25,0.75)
smpl <- df[sample(nrow(df), 50, prob = prb),]

answered Oct 27 '22 23:10

João Daniel

Related questions
                            
                                R system functions always returns error 127
                            
                                How to define "hidden global variables" inside R packages?
                            
                                Prevent Rstudio console from showing script commands
                            
                                Align multiple plots with varying spacings and add arrows between them
                            
                                Add curly braces to ggplot2 and then use ggsave
                            
                                ggplot jitter geom_errorbar?
                            
                                Summarize data.table by group
                            
                                Remove leading NAs to align data
                            
                                ggplot2: add p-values to the plot
                            
                                Error in bind_rows_(x, .id) : Column can't be converted from factor to numeric
                            
                                Merging list with common elements
                            
                                Use of tidyeval based non-standard evaluation in recode in right-hand side of mutate
                            
                                Joining factor levels of two columns
                            
                                transform vector into list
                            
                                Repeated-measures / within-subjects ANOVA in R
                            
                                R: Function that finds the range of 95% of all values?
                            
                                Change text on strips in lattice plots
                            
                                Venn diagram from list of clusters and co-occurring factors
                            
                                Comparing rows between two matrices
                            
                                How to partition when ranking on a particular column?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Taking a disproportionate sample from a dataset in R

Tags:

random

r

sampling

simplyme

People also ask

1 Answers

João Daniel

Recent Activity

Donate For Us