I am hoping to create all possible permutations of a vector containing two different values, in which I control the proportion of each of the values.
For example, if I have a vector of length three and I want all possible combinations containing a single 1, my desired output is a list looking like this:
list.1 <- list(c(1,0,0), c(0,1,0), c(0,0,1))
In contrast, if I want all possible combinations containing three 1s, my desired output is a list looking like this:
list.3 <- list(c(1,1,1))
To put it another way, the pattern of the 1
and 0
values matter, but all 1
s should be treated as identical to all other 1
s.
Based on searching here and elsewhere, I've tried several approaches:
expand.grid(0:1, 0:1, 0:1) # this includes all possible combinations of 1, 2, or 3 ones
permn(c(0,1,1)) # this does not treat the ones as identical (e.g. it produces (0,1,1) twice)
unique(permn(c(0,1,1))) # this does the job!
So, using the function permn
from the package combinat
seems promising. However, where I scale this up to my actual problem (a vector of length 20, with 50% 1s and 50% 0s, I run into problems:
unique(permn(c(rep(1,10), rep(0, 10))))
# returns the error:
Error in vector("list", gamma(n + 1)) :
vector size specified is too large
My understanding is that this is happening because, in the call to permn
, it makes a list containing all possible permutations, even though many of them are identical, and this list is too large for R to handle.
Does anyone have a suggestion for how to work around this?
Sorry if this has been answered previously - there are many, many SO questions containing similar language but different problems and I have not bene able to find a solution which meets my needs!
It should not be a dealbreaker that expand.grid
includes all permutations. Just add a subset after:
combinations <- function(size, choose) {
d <- do.call("expand.grid", rep(list(0:1), size))
d[rowSums(d) == choose,]
}
combinations(size=10, choose=3)
# Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10
# 8 1 1 1 0 0 0 0 0 0 0
# 12 1 1 0 1 0 0 0 0 0 0
# 14 1 0 1 1 0 0 0 0 0 0
# 15 0 1 1 1 0 0 0 0 0 0
# 20 1 1 0 0 1 0 0 0 0 0
# 22 1 0 1 0 1 0 0 0 0 0
...
The problem is indeed that you are initially computing all factorial(20) (~10^18) permutations, which will not fit in your memory.
What you are looking for is an efficient way to compute multiset permutations. The multicool
package can do this:
library(multicool)
res <- allPerm(initMC(c(rep(0,10),rep(1,10) )))
This computation takes about two minutes on my laptop, but is definitely feasible.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With