sample with replacement but constrain the max frequency of each member to be drawn

Tags:

Is it possible to extend the sample function in R to not return more than say 2 of the same element when replace = TRUE?

Suppose I have a list:

l = c(1,1,2,3,4,5)

To sample 3 elements with replacement, I would do:

sample(l, 3, replace = TRUE)

Is there a way to constrain its output so that only a maximum of 2 of the same elements are returned? So (1,1,2) or (1,3,3) is allowed, but (1,1,1) or (3,3,3) is excluded?

647

asked Sep 30 '18 22:09

Thomas Moore

1 Answers

set.seed(0)

The basic idea is to convert sampling with replacement to sampling without replacement.

ll <- unique(l)          ## unique values
#[1] 1 2 3 4 5
pool <- rep.int(ll, 2)   ## replicate each unique so they each appear twice
#[1] 1 2 3 4 5 1 2 3 4 5
sample(pool, 3)          ## draw 3 samples without replacement
#[1] 4 3 5

## replicate it a few times
## each column is a sample after out "simplification" by `replicate`
replicate(5, sample(pool, 3))
#     [,1] [,2] [,3] [,4] [,5]
#[1,]    1    4    2    2    3
#[2,]    4    5    1    2    5
#[3,]    2    1    2    4    1

If you wish different value to appear up to different number of times, we can do for example

pool <- rep.int(ll, c(2, 3, 3, 4, 1))
#[1] 1 1 2 2 2 3 3 3 4 4 4 4 5

## draw 9 samples; replicate 5 times
oo <- replicate(5, sample(pool, 9))
#      [,1] [,2] [,3] [,4] [,5]
# [1,]    5    1    4    3    2
# [2,]    2    2    4    4    1
# [3,]    4    4    1    1    1
# [4,]    4    2    3    2    5
# [5,]    1    4    2    5    2
# [6,]    3    4    3    3    3
# [7,]    1    4    2    2    2
# [8,]    4    1    4    3    3
# [9,]    3    3    2    2    4

We can call tabulate on each column to count the frequency of 1, 2, 3, 4, 5:

## set `nbins` in `tabulate` so frequency table of each column has the same length
apply(oo, 2L, tabulate, nbins = 5)
#     [,1] [,2] [,3] [,4] [,5]
#[1,]    2    2    1    1    2
#[2,]    1    2    3    3    3
#[3,]    2    1    2    3    2
#[4,]    3    4    3    1    1
#[5,]    1    0    0    1    1

The count in all columns meet the frequency upper bound c(2, 3, 3, 4, 1) we have set.

Would you explain the difference between rep and rep.int?

rep.int is not the "integer" method for rep. It is just a faster primitive function with less functionality than rep. You can get more details of rep, rep.int and rep_len from the doc page ?rep.

102

answered Sep 20 '22 00:09

Zheyuan Li

Related questions
                            
                                Inserting missing years to complete a data.frame
                            
                                R - Given a matrix and a power, produce multiple matrices containing all unique combinations of matrix columns
                            
                                How to convert column types in R tidyverse
                            
                                Subset vector not containing word in piped operation in R (regex)
                            
                                Supplying multiple groups of variables to a function for dplyr arguments in the body
                            
                                add latex expression on x-axis ticks @ggplot2
                            
                                RSelenium behind proxy
                            
                                How to change to another bibliography style in Bookdown
                            
                                How to calculate R Squared value for Lasso regression using glmnet in R
                            
                                Change values in a data set in Julia
                            
                                Extract slope of multiple trend lines from geom_smooth()
                            
                                How to text wrap choices from a pickerInput, If the length of the choices are long the choices often end up outside the screen
                            
                                Reshaping data from long to wide with both sums and counts
                            
                                Using gsub or sub function to only get part of a string?
                            
                                use the here() function to go up a level above root directory
                            
                                Multiply pairs of columns using dplyr in R
                            
                                Select pairs that are repeated in a list of dataframes
                            
                                How to center table in box in Shiny Dashboard
                            
                                How to harmonize axes within facet_wrap and scale="free_y"?
                            
                                R sum a variable by two groups [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

sample with replacement but constrain the max frequency of each member to be drawn

Tags:

random

r

sample

Thomas Moore

People also ask

1 Answers

Zheyuan Li

Recent Activity

Donate For Us