Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sample n rows in permutations table resulting in similar element frequencies by column in R

I am working with R and am faced with the following combinatorial problem. The initial situation is a data frame with 512 rows containing all possible triple combinations of the digits 1 to 8:

expand.grid(rep(list(1:8), 3))

Now I would like to sample 420 rows from this data frame so that the frequency of each digit in each column is as similar as possible.

The randomly produced table would look like this and contains - depending on chance - very fluctuating frequencies.

expand.grid(rep(list(1:8), 3)) %>%
  filter(row_number() %in% sample(1:nrow(.), 420))

Does some sort of constraint exist in order to obtain frequencies that are as equal as possible?

Edit: However, the result doesn't have to be random. Is there a way to filter 420 rows with maximally equal frequencies?

like image 761
MaVe Avatar asked Oct 23 '25 04:10

MaVe


1 Answers

Stratified Sampling

Note that expand.grid makes variables such that the first varies fastest, the last slowest ... use stratified sampling, dividing the rows into 8*8=64 groups, strata, and sample 6 or 7 from each, since

 420/64
[1] 6.5625

R code for this follows:

set.seed(7 * 11 * 13)
G <- expand.grid(rep(list(1:8), 3))

M <- matrix(1:512, 64, 8, byrow=TRUE) 
rows <- apply(M, 1, \(x) sample(x, ifelse(runif(1) <= 0.5, 6, 7))) |> unlist() 
m <- length(rows)
DIFF <- setdiff(1:512, rows)
morerows <- sample(DIFF, 420 - m) 
rows <- c(rows, morerows)
GG <- G[rows, ]

Then looking at frequency tables for each variable:

lapply(GG, table)
$Var1

 1  2  3  4  5  6  7  8 
55 49 53 52 50 54 51 56 

$Var2

 1  2  3  4  5  6  7  8 
51 54 53 54 51 51 52 54 

$Var3

 1  2  3  4  5  6  7  8 
53 53 50 54 54 54 50 52 
like image 77
kjetil b halvorsen Avatar answered Oct 25 '25 17:10

kjetil b halvorsen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!