I have a data frame containing 7 different dilutions that I want to assign into 3 different bins in all possible combinations for later use in lpSolve
. I can generate all 2187 possible combinations using:
expand.grid(1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3)
However, since the actual bin number is not important (but position is), the following entries are all considered identical in this context:
c(1, 1, 2, 2, 2, 3, 3)
c(2, 2, 3, 3, 3, 1, 1)
c(3, 3, 1, 1, 1, 2, 2)
c(3, 3, 2, 2, 2, 1, 1)
c(1, 1, 3, 3, 3, 2, 2)
...
How do I generate only unique "patterns", either by filtering expand.grid
output or by using another (custom) function. For example, the lengths output from rle
of all of the above vectors would be 2 3 2
, but that would also be the case for c(1, 1, 2, 2, 2, 1, 1)
which should be not considered identical to the above.
Any fast way around this? I do not need to go higher than 5 bins and 8 dilutions.
Here is an answer:
mat <- expand.grid(1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3)
mat <- t(apply(mat, 1,
function(x){
un <- unique(x)
map <- setNames(1:length(un), un)
map[as.character(x)]
}))
mat <- mat[!duplicated(mat), ]
nrow(mat)
# [1] 365
And logic is the following: let us take c(3,3,1,2,1,2,3)
and now I convert it to c(1,1,2,3,2,3,1)
, because 3
is the first unique number from the beginning, 1
is the second and 2
is the third one. In this way I convert all the rows to the same format and it allows me to use duplicated
. setNames
was useful here, it creates a map from one set of integers to another:
setNames(1:3,3:1)
3 2 1
1 2 3
setNames(1:3,3:1)[c("2","1")]
2 1
2 3
Finally, the proof:
Which takes into consideration cases when one, two or three different numbers are used. In particular: [(a single number takes all 7 positions)] + [(choosing 1 position for one number and all the rest is for another) + (choosing 2 positions for one number and all the rest is for another) + (choosing 3 positions and all the rest is for another)] + [(choosing 1 position for the first number, 1 position for the second and all the rest is for the third number. Now 1st and 2nd are considered the same, and they both occur once, so we have to divide this term by two) + ...(same logic as before)...]
This?
data <- expand.grid(1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3)
len <- apply(data,1,function(x) c(rle(x)$lengths[1:7], nchar(paste(unique(sort(rle(x)$value)), collapse=''))))
data <- data[!(duplicated(t(len))), ]
Or, as @Arun pointed:
data <- expand.grid(1:3, 1:3, 1:3, 1:3, 1:3, 1:3, 1:3)
len <- apply(data,1,function(x) c(rle(x)$lengths[1:7], length(unique(x))))
data <- data[!(duplicated(t(len))), ]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With