I'm in a situation where I need to create a bunch of fake datasets where the sum of two variables is the same as in my real data, but the counts for each variable are random. Here's the setup:
>df
X.1 X.2
1 145 30
2 55 73
The first row sums to 175, and the second to 128. What I'm looking for is a way to generate a data frame (or a bunch of data frames) like this:
>df.2
X.1 X.2
1 100 75
2 90 38
In df.2, the cell counts have changed, but the rows still sum to the same table. The actual data has hundreds of rows, but only two variables if that helps. I've tried to figure out how to do this with sample()
but haven't had any luck. Any suggestions?
Thanks!
Perhaps you're looking for r2dtable
?
> r2dtable(2, c(175,128), c(190, 113))
[[1]]
[,1] [,2]
[1,] 108 67
[2,] 82 46
[[2]]
[,1] [,2]
[1,] 114 61
[2,] 76 52
Also, here's a version of @mnel's answer that uses rmultinom
to do the n
replicates and then combines the results. Not that it really matters if you only need a few replicates, but since rmultinom
could do it, I thought I'd see how it might be done.
n <- 10
e <- cbind(X1 = c(100,90,30),X2 = c(75,28,120))
aperm(array(sapply(1:nrow(e), function(i)
rmultinom(n, rowSums(e)[i], (e/rowSums(e))[i,])),
dim=c(ncol(e),n,nrow(e))), c(3,1,2))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With