I'm using R to generate permutations of a vector that has replicates in it.
While generating the permutations I'm using numbers to represent the groups. Here's what I can do for small ones:
unlist(unique(permn(c(1,1,2,2,3,3,4,4), paste0, collapse = "")))
Which returns a vector of 2520 permutations (8!/2^4)
The problem is I'm trying to scroll this up to 11 so that I can get every unique permutation of a 16 choose 11. In order to get every combination I do:
combs = unique(combn(c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),11, paste0, collapse = ""))
and then would iterate through them and paste them together to get all unique 16 choose 11 permutations.
Sounds like a huge number?
It isn't. It's 525,525 rows, theoretically (16!/5!4!4!4!4!) The problem is that this method has to calculate all 174356582400 rows (that's 174 billion roughly) in groups of 39 million (11!) and do the unique operation on them.
Is there a method that shortcuts and factors in the replicates while finding the permutations?
Looking at other methods, I see that this would work:
unique(permutations(16,11, c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4), set=FALSE))
except that it spends too much time doing it, and it is doing the same thing that I'm doing above by finding all the bad ones and then uniqueing them out
What you are looking for is permutations of multisets.
library(RcppAlgos)
multiPerm <- permuteGeneral(1:4, freqs = rep(2,4))
head(multiPerm)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 1 2 2 3 3 4 4
[2,] 1 1 2 2 3 4 3 4
[3,] 1 1 2 2 3 4 4 3
[4,] 1 1 2 2 4 3 3 4
[5,] 1 1 2 2 4 3 4 3
[6,] 1 1 2 2 4 4 3 3
Sanity check:
library(combinat)
library(gtools)
OPTestOne <- unlist(unique(permn(c(1,1,2,2,3,3,4,4), paste0, collapse = "")))
all.equal(sort(apply(multiPerm, 1, paste, collapse="")), sort(OPTestOne))
[1] TRUE
OPTestTwo <- unique(permutations(8,8,c(1,1,2,2,3,3,4,4), set=FALSE))
all.equal(OPTestTwo, multiPerm)
[1] TRUE
Here are some benchmarks:
library(microbenchmark)
microbenchmark(OP_One = unique(permn(c(1,1,2,2,3,3,4,4), paste0, collapse = "")),
Algos = permuteGeneral(1:4, freqs = rep(2,4)),
OP_Two = unique(permutations(8,8,c(1,1,2,2,3,3,4,4), set=FALSE)),
times = 5, unit = "relative")
Unit: relative
expr min lq mean median uq max neval
OP_One 8435.40 5570.476 5877.457 5562.094 5378.490 5409.687 5
Algos 1.00 1.000 1.000 1.000 1.000 1.000 5
OP_Two 15335.55 10095.646 10700.802 9982.139 9539.425 10295.974 5
Finding permutations of multisets choose m is no problem either.
system.time(multiPermChoose11 <- permuteGeneral(1:4, m = 11, freqs = rep(4, 4)))
user system elapsed
0.154 0.023 0.178
head(multiPermChoose11)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] 1 1 1 1 2 2 2 2 3 3 3
[2,] 1 1 1 1 2 2 2 3 2 3 3
[3,] 1 1 1 1 2 2 2 3 3 2 3
[4,] 1 1 1 1 2 2 2 3 3 3 2
[5,] 1 1 1 1 2 2 3 2 2 3 3
[6,] 1 1 1 1 2 2 3 2 3 2 3
The OP's guess at how many permutations (525,525) for this latter example is not correct. Finding this is a little more involved than the one liner offered.
nrow(multiPermChoose11)
[1] 2310000
And just to show that this is correct:
length(unique(apply(multiPermChoose11, 1, paste, collapse ="")))
[1] 2310000
There is also a function from iterpc
that computes the number of permutations of multisets called np_multiset
iterpc::np_multiset(rep(4,4), 11)
[1] 2310000
For more information regarding problems like this in R, I wrote a thorough overview to the question: R: Permutations and combinations with/without replacement and for distinct/non-distinct items/multiset by @RandyLai (author of arrangements
and iterpc
, both of which are capable of doing the work above efficiently.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With