Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: how to sample without replacement AND without consecutive same values

Tags:

r

sample

I have spent over a day trying to accomplish what seems to be a very simple thing. I have to create 300 'random' sequences in which the numbers 1,2,3 and 4 all appear exactly 12 times, but the same number is never used twice 'in a row'/consecutively.

My best attempts (I guess) were:

  1. have R sample 48 items without replacement, test whether there are consecutive values with rle, then use only the sequences that do not contain consecutive values. Problem: there are almost no random sequences that meet this criterion, so it takes forever.

  2. have R create sequences without consecutive values (see code).

pop<-rep(1:4,12)
y=c()
while(length(y)!=48)
  {
  y= c(y,sample(pop,48-length(y),replace=F))
  y=y[!c(FALSE, diff(y) == 0)]
  }

Problem: this creates sequences with varying numbers of each value. I then tried to use only those sequences with exactly 12 of each value, but that only brought me back to problem 1: takes forever.

There must be some easy way to do this, right? Any help is greatly appreciated!

like image 538
CookieMons Avatar asked Oct 24 '19 11:10

CookieMons


People also ask

Why must we sample with replacement when resampling?

When we sample with replacement, the two sample values are independent. Practically, this means that what we get on the first one doesn't affect what we get on the second. Mathematically, this means that the covariance between the two is zero. In sampling without replacement, the two sample values aren't independent.

What does it mean if sampling is done without replacement?

In sampling without replacement, each sample unit of the population has only one chance to be selected in the sample. For example, if one draws a simple random sample such that no unit occurs more than one time in the sample, the sample is drawn without replacement.

What is sampling with replacement in R?

Sampling with replacement simply means that each number is “replaced” after it is selected, so that the same number can show up more than once. This is what we want here, since what you roll on one die shouldn't affect what you roll on any of the others. Now sample 10 numbers between 1 and 20, WITHOUT replacement.

What does replace false mean in R?

When you sample replace = False, first element/number picked for sampling will not kept back in entire population to be picked again in same sample.


2 Answers

Another option is to use a Markov Chain Monte-Carlo method to swap 2 numbers randomly and move to the new sample only when 1) we are not swapping the same number and 2) no 2 identical numbers are adjacent. To address correlated samples, we can generate a lot of samples and then randomly select 300 of them:

v <- rep(1:4, 12)
l <- 48
nr <- 3e5
m <- matrix(0, nrow=nr, ncol=l)
count <- 0
while(count < nr) {
    i <- sample(l, 2)
    if (i[1L] != i[2L]) {
        v[i] = v[i[2:1]]
        if (!any(diff(v)==0)) {
            count <- count + 1
            m[count, ] <- v
        } else {
            v[i] = v[i[2:1]]
        }
    }
}
a <- m[sample(nr, 300),]
a
like image 176
chinsoon12 Avatar answered Oct 11 '22 12:10

chinsoon12


Maybe using replicate() with a repeat loop is faster. here an example with 3 sequences. Looks like this would take approx. 1490 seconds with 300 (not tested).

set.seed(42)
seqc <- rep(1:4, each=12)  # starting sequence

system.time(
  res <- replicate(3, {
    repeat {
      seqcs <- sample(seqc, 48, replace=FALSE) 
      if (!any(diff(seqcs) == 0)) break
    }
    seqcs
  })
)
#  user  system elapsed 
# 14.88    0.00   14.90 

res[1:10, ]
#       [,1] [,2] [,3]
#  [1,]    4    2    3
#  [2,]    1    1    4
#  [3,]    3    2    1
#  [4,]    1    1    4
#  [5,]    2    3    1
#  [6,]    4    1    2
#  [7,]    3    4    4
#  [8,]    2    1    1
#  [9,]    3    4    4
# [10,]    4    3    2
like image 22
jay.sf Avatar answered Oct 11 '22 10:10

jay.sf