I would like to generate a random contingency table with the marginals all equal.
Easiest example would be to have the table:
3 3 3 | 9
3 3 3 | 9
3 3 3 | 9
_ _ _
9 9 9
So sum(r_i) = sum(c_j) =9. I would like to find all the contingency tables that meet this criteria, and then be able to analyze some features for that set of tables.
Is there an "easy" way to generate these tables in R?
Your question isn't completely precise. Generating random contingency tables is easy. Finding all contingency tables that meet those criteria might be harder, since the probabilities of tables are highly non-uniform and it will take a very large sample to make sure you have them all. (Someone gave the beginnings of a solution for deterministic enumeration based on the partitions package, but seems to have deleted their answer ...) The r2dtable in the stats package (a core package) samples random tables:
Generating just 1 sample (the results are returned as a list):
set.seed(101)
r2dtable(n=1,r=c(9,9,9),c=c(9,9,9))[[1]]
## [,1] [,2] [,3]
## [1,] 4 3 2
## [2,] 2 4 3
## [3,] 3 2 4
How likely is your example?
set.seed(102)
tList <- r2dtable(n=50000,r=c(9,9,9),c=c(9,9,9))
Convert results to strings for easier comparisons:
vals <- sapply(tList,function(x) paste(c(x),collapse=""))
How many are there?
length(unique(vals)) ## 1018
Update: a much larger sample (n=500000) gave 1276 unique tables. This seems more plausible on the grounds of symmetries, but may not be complete -- based on the log-frequency distribution, there's probably a longer tail I haven't caught yet.
In fact there is: this web page gives a way to calculate the number of tables; there are 1540 for all margins equal to 9.
Log-frequency distribution:
plot(log10(rev(sort(table(vals)))),type="l")

Most common tables:
head(rev(sort(table(vals))))
## vals
## 333333333 342324333 333324342 333342324 423333243 234333432
## 996 626 626 605 596 592
(For extra credit, I should try to collapse symmetric cases.)
Probability of all-equal:
mean(vals=="333333333") ## 0.1992
The deterministic approach (which I hope the owner will restore) starts with the compositions() function from the partitions package, which enumerates all the ways of partitioning an integer N into n components: compositions(9,3) gives all the sets of 3 non-negative integers that sum to 9, which represents all the possible rows/columns in your contingency matrix.
I'm still thinking about how to take these raw materials and combine them to enumerate tables: there must be at least 1276 of them, so it's not just all permutations of individual compositions (which would only give 3!*55=330).
This is a start but doesn't actually work:
library("partitions")
cc <- compositions(9,3)
too.many <- combn(split(cc,col(cc)),3,
FUN=function(x) do.call(cbind,x),
simplify=FALSE) ## 26235
ok <- sapply(too.many,function(x) all(rowSums(x)==9))
Only 252 OK? Maybe we need to allow all permutations of these outcomes (which would allow 252*6 = 1512, a plausible result ...) ?
This may come off as a crazy answer to this question, and it is a crazy (and also incomplete) answer. But, it's related to your intended result and is also sudoku. So that's funny.
library("sudokuAlt")
g <- matrix(as.numeric(makeGame(3,0)), nrow = 9)
colSums(g)
# [1] 45 45 45 45 45 45 45 45 45
rowSums(g)
# [1] 45 45 45 45 45 45 45 45 45
Basically, a sudoku puzzle is a specific case of what you're looking for. This also adds the constraint that each row and each column not only have the same marginals but also the exact same combination of elements. This package only implements puzzles that 9x9, 16x16, or 25x25, but you could look at the source code to see how they generate the puzzles, and probably build a more general solution from there.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With