Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to randomize a vector without repeating specific elements in predefined triples?

I started with an allegedly simple setup, which turned out to become quite challenging:

Say, we have a bowl which contains W = 60 white balls, B = 10 blue balls, G = 10 green balls and Y = 10 yellow balls. Now I start to draw triples from that bowl and store them, until the bowl is empty. However, there is one rule:

RULE:

Each triple may not contain more than one non-white ball of the same colour!

When done I am interested in the ratio of triples with 0, 1, 2 and 3 non-white balls, respectively.

To solve this problem I started with the idea of drawing and rejecting samples, until there is a sample, which fullfills the RULE above.

I tried with this (hopefully reproducible) code:

W = rep(0, times = 60)
BGY = c(rep(1, times = 10),rep(2, times = 10),rep(3, times = 10))
sumup = matrix(c(rep(1,times=3)),byrow=FALSE)
OUTPUT = c(0,0,0,0) 

getBALLS = function(W,BGY){
  k = 0
  while (k == 0){
    POT = c(W, BGY)
    STEPS = (length(W) + length(BGY))/3 
    randPOT <<- sample(POT, STEPS*3, replace=FALSE)
    for(j in 1:STEPS){
      if (.subset2(randPOT,3*j-2)!=.subset2(randPOT,3*j-1) &&
          .subset2(randPOT,3*j-2)!= .subset2(randPOT,3*j) && 
          .subset2(randPOT,3*j-1)!=.subset2(randPOT,3*j)){
        next
      }
      else getBALLS(W, BGY)
    }
    k = 1
  }
  TABLES = matrix(randPOT, nrow=3, byrow=FALSE)
  Bdistr = t(TABLES) %*% sumup
  for(i in 1:STEPS){
    if (.subset2(Bdistr,i)==1) OUTPUT[1] <<- .subset2(OUTPUT,1)+1
    else if (.subset2(Bdistr,i)==0) OUTPUT[4] <<- .subset2(OUTPUT,4)+1
    else if (.subset2(Bdistr,i)==2) OUTPUT[2] <<- .subset2(OUTPUT,2)+1
    else OUTPUT[3] <<- .subset2(OUTPUT,3)+1
  }
  rOUTPUT = OUTPUT/ STEPS
  return(rOUTPUT)
}    

set.seed(1)
getBALLS(W,BGY)

Unfortunately I encountered two problems:

  1. The loop iterates too many times! It seems like the rule is violated quite often, which makes sampling in that way probably not feasibile.
  2. Although I tried to call the most efficient functions, when there where more than one way of getting there (e.g. .subset2 call), I have a feeling, that this code is quite inefficient in solving this problem.

Next I tried with two-stage sampling (more specific the mstage function from the sampling package):

Stage1 = c( rep(0,12), rep(1,3), rep(2,3) )
Stage2 = c( rep(0,12), rep(1,3), rep(2,3) )
b = data.frame(Stage1, Stage2)
probs = list( list( (1/12) , (1/3), (1/3) ), list( rep(1/12,12),rep(1/3,3),rep(1/3,3) ) )
m = mstage( b, stage = list("cluster","cluster"), varnames = list("Stage1","Stage2"), 
            size = list(3,c(1,1,1)), method = "systematic", pik = probs)

While this didn't work out either, I also felt like this approach doesn't fit my problem that well!

All told it seems to me a bit like I was using a sledgehammer to crack a nut and I feel like there is a much more efficient way in tackling this problem (especially since I'd like to run some Monte-Carlo simmulations afterwards).

I'd appreciate any help! Thanks in advance!

like image 713
freeconomist Avatar asked Jan 04 '16 11:01

freeconomist


1 Answers

Here is an alternative approach which no doubt could be improved, but which I think makes some kind of statistical sense (having a particular colour in a sample of three makes it less likely another colour is in the same sample of three).

coloursinsamples <- function (W,B,G,Y){
    WBGY <- c(W,B,G,Y)
    if(sum(WBGY) %% 3 != 0){ warning("cannot take exact full sample") }
    numbersamples <- sum(WBGY) / 3 
    if(max(WBGY[2:4]) > numbersamples){ warning("too many of a colour") }

    weights <- rep(3,numbersamples)
    sampleB <- sample(numbersamples, size=WBGY[2], prob=weights)
    weights[sampleB] <- weights[sampleB]-1
    sampleG <- sample(numbersamples, size=WBGY[3], prob=weights)
    weights[sampleG] <- weights[sampleG]-1
    sampleY <- sample(numbersamples, size=WBGY[4], prob=weights)
    weights[sampleY] <- weights[sampleY]-1

    numbercolours <- table(table(c(sampleB,sampleG,sampleY)))
    result <- c("0" = numbersamples - sum(numbercolours), numbercolours)
    if(! "1" %in% names(result)){ result <- c(result, "1"=0) }
    if(! "2" %in% names(result)){ result <- c(result, "2"=0) }
    if(! "3" %in% names(result)){ result <- c(result, "3"=0) }
    result[as.character(0:3)]
    }

set.seed(1)
coloursinsamples(6,1,1,1)
coloursinsamples(60,10,10,10)
coloursinsamples(600,100,100,100)
coloursinsamples(6000,1000,1000,1000)
like image 99
Henry Avatar answered Oct 21 '22 19:10

Henry