Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Permutations of data sets with duplicates in R

Tags:

r

permutation

I'm using R to generate permutations of a vector that has replicates in it.

While generating the permutations I'm using numbers to represent the groups. Here's what I can do for small ones:

unlist(unique(permn(c(1,1,2,2,3,3,4,4), paste0, collapse = "")))

Which returns a vector of 2520 permutations (8!/2^4)

The problem is I'm trying to scroll this up to 11 so that I can get every unique permutation of a 16 choose 11. In order to get every combination I do:

combs = unique(combn(c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),11, paste0, collapse = ""))

and then would iterate through them and paste them together to get all unique 16 choose 11 permutations.

Sounds like a huge number?

It isn't. It's 525,525 rows, theoretically (16!/5!4!4!4!4!) The problem is that this method has to calculate all 174356582400 rows (that's 174 billion roughly) in groups of 39 million (11!) and do the unique operation on them.

Is there a method that shortcuts and factors in the replicates while finding the permutations?

Looking at other methods, I see that this would work:

unique(permutations(16,11, c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4), set=FALSE))

except that it spends too much time doing it, and it is doing the same thing that I'm doing above by finding all the bad ones and then uniqueing them out

like image 258
Duncan McPherson Avatar asked Feb 20 '18 21:02

Duncan McPherson


1 Answers

What you are looking for is permutations of multisets.

library(RcppAlgos)

multiPerm <- permuteGeneral(1:4, freqs = rep(2,4))

head(multiPerm)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    1    1    2    2    3    3    4    4
[2,]    1    1    2    2    3    4    3    4
[3,]    1    1    2    2    3    4    4    3
[4,]    1    1    2    2    4    3    3    4
[5,]    1    1    2    2    4    3    4    3
[6,]    1    1    2    2    4    4    3    3

Sanity check:

library(combinat)
library(gtools)
OPTestOne <- unlist(unique(permn(c(1,1,2,2,3,3,4,4), paste0, collapse = "")))
all.equal(sort(apply(multiPerm, 1, paste, collapse="")), sort(OPTestOne))
[1] TRUE

OPTestTwo <- unique(permutations(8,8,c(1,1,2,2,3,3,4,4), set=FALSE))
all.equal(OPTestTwo, multiPerm)
[1] TRUE  

Here are some benchmarks:

library(microbenchmark)
microbenchmark(OP_One = unique(permn(c(1,1,2,2,3,3,4,4), paste0, collapse = "")),
               Algos = permuteGeneral(1:4, freqs = rep(2,4)),
               OP_Two = unique(permutations(8,8,c(1,1,2,2,3,3,4,4), set=FALSE)),
               times = 5, unit = "relative")
Unit: relative
  expr      min        lq      mean   median       uq       max neval
OP_One  8435.40  5570.476  5877.457 5562.094 5378.490  5409.687     5
 Algos     1.00     1.000     1.000    1.000    1.000     1.000     5
OP_Two 15335.55 10095.646 10700.802 9982.139 9539.425 10295.974     5

Finding permutations of multisets choose m is no problem either.

system.time(multiPermChoose11 <- permuteGeneral(1:4, m = 11, freqs = rep(4, 4)))
 user  system elapsed 
0.154   0.023   0.178

head(multiPermChoose11)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,]    1    1    1    1    2    2    2    2    3     3     3
[2,]    1    1    1    1    2    2    2    3    2     3     3
[3,]    1    1    1    1    2    2    2    3    3     2     3
[4,]    1    1    1    1    2    2    2    3    3     3     2
[5,]    1    1    1    1    2    2    3    2    2     3     3
[6,]    1    1    1    1    2    2    3    2    3     2     3

The OP's guess at how many permutations (525,525) for this latter example is not correct. Finding this is a little more involved than the one liner offered.

nrow(multiPermChoose11)
[1] 2310000

And just to show that this is correct:

length(unique(apply(multiPermChoose11, 1, paste, collapse ="")))
[1] 2310000

There is also a function from iterpc that computes the number of permutations of multisets called np_multiset

iterpc::np_multiset(rep(4,4), 11)
[1] 2310000

For more information regarding problems like this in R, I wrote a thorough overview to the question: R: Permutations and combinations with/without replacement and for distinct/non-distinct items/multiset by @RandyLai (author of arrangements and iterpc, both of which are capable of doing the work above efficiently.)

like image 79
Joseph Wood Avatar answered Nov 16 '22 04:11

Joseph Wood