Generate data where cell counts are random, but row sums always the same

Question

I'm in a situation where I need to create a bunch of fake datasets where the sum of two variables is the same as in my real data, but the counts for each variable are random. Here's the setup:

>df
    X.1  X.2
1   145   30
2    55   73

The first row sums to 175, and the second to 128. What I'm looking for is a way to generate a data frame (or a bunch of data frames) like this:

>df.2
    X.1  X.2
1   100   75
2    90   38

In df.2, the cell counts have changed, but the rows still sum to the same table. The actual data has hundreds of rows, but only two variables if that helps. I've tried to figure out how to do this with sample() but haven't had any luck. Any suggestions?

Thanks!

Aaron left Stack Overflow · Accepted Answer

Perhaps you're looking for r2dtable?

> r2dtable(2, c(175,128), c(190, 113))
[[1]]
     [,1] [,2]
[1,]  108   67
[2,]   82   46

[[2]]
     [,1] [,2]
[1,]  114   61
[2,]   76   52

Also, here's a version of @mnel's answer that uses rmultinom to do the n replicates and then combines the results. Not that it really matters if you only need a few replicates, but since rmultinom could do it, I thought I'd see how it might be done.

n <- 10
e <- cbind(X1  = c(100,90,30),X2 = c(75,28,120))
aperm(array(sapply(1:nrow(e), function(i) 
        rmultinom(n, rowSums(e)[i], (e/rowSums(e))[i,])),
      dim=c(ncol(e),n,nrow(e))), c(3,1,2))

Generate data where cell counts are random, but row sums always the same

Tags:

r

bosbmgatl

1 Answers

Aaron left Stack Overflow

Recent Activity

Donate For Us

Generate data where cell counts are random, but row sums always the same

Tags:

r

bosbmgatl

1 Answers

Aaron left Stack Overflow

Related questions

Recent Activity

Donate For Us