What is an efficient way to generate a random contingency table? A contingency table is defined as a rectangular matrix such that the sum of each row is fixed, and the sum of each column is fixed, but the individual elements may be anything as long as the sum of each row and column is correct.
Note that it's very easy to generate random contingency tables, but I'm looking for something more efficient than the naive algorithm.
Creating a basic contingency table. To create a contingency table of the data in the var1 column cross-classified with the data in the var2 column, choose the Stat > Tables > Contingency > With Data menu option. Select var1 as the Row variable, choose var2 as the Column variable, and click Compute!.
The grand total is the number of outcomes for the denominator. Consequently, to calculate joint probabilities in a contingency table, take each cell count and divide by the grand total. For our example, the joint probability of females buying Macs equals the value in that cell (87) divided by the grand total (223).
Use contingency tables to understand the relationship between categorical variables. For example, is there a relationship between gender (male/female) and type of computer (Mac/PC)? I love these tables because they organize your data and allow you to answer diverse questions.
Looking at the code of the networksis package for R might be helpful. I believe that efficient computation requires fancy Markov Chain sequential importance resampling techniques, so you might want to avoid reimplementing this if you can avoid it.
Edit: The relevant paper is Chen, Diaconis, Holmes, and Liu (2005). In the words of the authors, "[o]ur method compares favorably with other existing Monte Carlo- based algorithms, and sometimes is a few orders of magnitude more efficient."
This sounds like a constraint satisfaction problem (CSP) to me.
You would basically start at some point and choose a cell's value randomly from the set of allowed values. Then you update the sets of eligible values for all cells in the same row/column and choose the next cell (according to the CSP heuristic you are using) to (randomly) assign a value to, again from its set of eligible values. Again, you also have to update the sets of eligible values for all cells in the same row/column. In case you encounter a cell that has an empty set of eligible values, you have to do backtracking.
However, the notion of 'set of eligible values' might be hard to represent in a data structure, depending on the range of values you are allowing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With