Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate unique patterns of numbers (e.g. 1221 considered same pattern as 2112)

Tags:

r

I have a data frame containing 7 different dilutions that I want to assign into 3 different bins in all possible combinations for later use in lpSolve. I can generate all 2187 possible combinations using:

expand.grid(1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3)

However, since the actual bin number is not important (but position is), the following entries are all considered identical in this context:

c(1, 1, 2, 2, 2, 3, 3)
c(2, 2, 3, 3, 3, 1, 1)
c(3, 3, 1, 1, 1, 2, 2)
c(3, 3, 2, 2, 2, 1, 1)
c(1, 1, 3, 3, 3, 2, 2)
...

How do I generate only unique "patterns", either by filtering expand.grid output or by using another (custom) function. For example, the lengths output from rle of all of the above vectors would be 2 3 2, but that would also be the case for c(1, 1, 2, 2, 2, 1, 1) which should be not considered identical to the above.

Any fast way around this? I do not need to go higher than 5 bins and 8 dilutions.

like image 699
Kristoffer Avatar asked Mar 07 '13 10:03

Kristoffer


2 Answers

Here is an answer:

mat <- expand.grid(1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3)

mat <- t(apply(mat, 1, 
               function(x){
                 un <- unique(x)
                 map <- setNames(1:length(un), un)
                 map[as.character(x)]
               }))

mat <- mat[!duplicated(mat), ]
nrow(mat)
# [1] 365

And logic is the following: let us take c(3,3,1,2,1,2,3) and now I convert it to c(1,1,2,3,2,3,1), because 3 is the first unique number from the beginning, 1 is the second and 2 is the third one. In this way I convert all the rows to the same format and it allows me to use duplicated. setNames was useful here, it creates a map from one set of integers to another:

setNames(1:3,3:1)
3 2 1 
1 2 3 
setNames(1:3,3:1)[c("2","1")]
2 1 
2 3 

Finally, the proof:

enter image description here

Which takes into consideration cases when one, two or three different numbers are used. In particular: [(a single number takes all 7 positions)] + [(choosing 1 position for one number and all the rest is for another) + (choosing 2 positions for one number and all the rest is for another) + (choosing 3 positions and all the rest is for another)] + [(choosing 1 position for the first number, 1 position for the second and all the rest is for the third number. Now 1st and 2nd are considered the same, and they both occur once, so we have to divide this term by two) + ...(same logic as before)...]

like image 51
Julius Vainora Avatar answered Oct 07 '22 05:10

Julius Vainora


This?

data <- expand.grid(1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3)
len <- apply(data,1,function(x) c(rle(x)$lengths[1:7], nchar(paste(unique(sort(rle(x)$value)), collapse=''))))
data <- data[!(duplicated(t(len))), ]

Or, as @Arun pointed:

data <- expand.grid(1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3)
len <- apply(data,1,function(x) c(rle(x)$lengths[1:7], length(unique(x))))
data <- data[!(duplicated(t(len))), ]
like image 22
Rcoster Avatar answered Oct 07 '22 05:10

Rcoster