I have spent over a day trying to accomplish what seems to be a very simple thing. I have to create 300 'random' sequences in which the numbers 1,2,3 and 4 all appear exactly 12 times, but the same number is never used twice 'in a row'/consecutively.
My best attempts (I guess) were:
have R sample 48 items without replacement, test whether there are consecutive values with rle, then use only the sequences that do not contain consecutive values. Problem: there are almost no random sequences that meet this criterion, so it takes forever.
have R create sequences without consecutive values (see code).
pop<-rep(1:4,12)
y=c()
while(length(y)!=48)
{
y= c(y,sample(pop,48-length(y),replace=F))
y=y[!c(FALSE, diff(y) == 0)]
}
Problem: this creates sequences with varying numbers of each value. I then tried to use only those sequences with exactly 12 of each value, but that only brought me back to problem 1: takes forever.
There must be some easy way to do this, right? Any help is greatly appreciated!
When we sample with replacement, the two sample values are independent. Practically, this means that what we get on the first one doesn't affect what we get on the second. Mathematically, this means that the covariance between the two is zero. In sampling without replacement, the two sample values aren't independent.
In sampling without replacement, each sample unit of the population has only one chance to be selected in the sample. For example, if one draws a simple random sample such that no unit occurs more than one time in the sample, the sample is drawn without replacement.
Sampling with replacement simply means that each number is “replaced” after it is selected, so that the same number can show up more than once. This is what we want here, since what you roll on one die shouldn't affect what you roll on any of the others. Now sample 10 numbers between 1 and 20, WITHOUT replacement.
When you sample replace = False, first element/number picked for sampling will not kept back in entire population to be picked again in same sample.
Another option is to use a Markov Chain Monte-Carlo method to swap 2 numbers randomly and move to the new sample only when 1) we are not swapping the same number and 2) no 2 identical numbers are adjacent. To address correlated samples, we can generate a lot of samples and then randomly select 300 of them:
v <- rep(1:4, 12)
l <- 48
nr <- 3e5
m <- matrix(0, nrow=nr, ncol=l)
count <- 0
while(count < nr) {
i <- sample(l, 2)
if (i[1L] != i[2L]) {
v[i] = v[i[2:1]]
if (!any(diff(v)==0)) {
count <- count + 1
m[count, ] <- v
} else {
v[i] = v[i[2:1]]
}
}
}
a <- m[sample(nr, 300),]
a
Maybe using replicate()
with a repeat
loop is faster. here an example with 3
sequences. Looks like this would take approx. 1490 seconds with 300
(not tested).
set.seed(42)
seqc <- rep(1:4, each=12) # starting sequence
system.time(
res <- replicate(3, {
repeat {
seqcs <- sample(seqc, 48, replace=FALSE)
if (!any(diff(seqcs) == 0)) break
}
seqcs
})
)
# user system elapsed
# 14.88 0.00 14.90
res[1:10, ]
# [,1] [,2] [,3]
# [1,] 4 2 3
# [2,] 1 1 4
# [3,] 3 2 1
# [4,] 1 1 4
# [5,] 2 3 1
# [6,] 4 1 2
# [7,] 3 4 4
# [8,] 2 1 1
# [9,] 3 4 4
# [10,] 4 3 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With