Randomizing balanced experimental designs

Tags:

The dataframe df is a balanced design of 7 sets. Each set will display 4 items to the respondent. The numeric values in df refers to 7 different attributes. For example, in Set1 respondent will be asked to choose his/her preferred option from attributes 1, 3, 4 and 7.

The ordering of items in each set is conceptually not important. Thus an ordering of (1,4,5,7) is identical to (7,5,4,1).

However, to get a fully balanced design, each attribute will appear an equal number of times in each column. This design is there imbalanced, since attribute 1 appears 4 times in column 1:

df

     Item1 Item2 Item3 Item4
Set1     1     4     5     7
Set2     2     4     6     7
Set3     1     2     5     6
Set4     3     5     6     7
Set5     1     3     4     6
Set6     1     2     3     7
Set7     2     3     4     5

To try and find a more balanced design, I wrote the function findBalance. This conducts a random search for better solutions, by randomly sampling across the rows of df. With 100 repeats it finds the following best solution:

set.seed(12345)
dfbest <- findBalance(df, nrepeat=100)
dfbest

     Item1 Item2 Item3 Item4
Set1     7     5     1     4
Set2     6     7     4     2
Set3     2     1     5     6
Set4     5     6     7     3
Set5     3     1     6     4
Set6     7     2     3     1
Set7     4     3     2     5

This appears more balanced, and the calculated balance matrix contains lots of ones. The balance matrix counts the number of times each attribute appears in each column. For example, the following table indicates (in the top left hand cell) that attribute 1 appears twice not at all in column 1, and twice in column 2:

balanceMatrix(dfbest)

     Item1 Item2 Item3 Item4
[1,]     0     2     1     1
[2,]     1     1     1     1
[3,]     1     1     1     1
[4,]     1     0     1     2
[5,]     1     1     1     1
[6,]     1     1     1     1
[7,]     2     1     1     0

The balance score for this solution is 6, indicating there are at least six cells unequal to 1:

balanceScore(balanceMatrix(dfbest))
[1] 6

My question

Thank you for following this detailed example. My question is how can I rewrite this search function to be more systematic? I'd like to tell R to:

Minimize balanceScore(df)
By changing row order of df
Subject to: already fully constrained

848

asked Apr 12 '11 13:04

Andrie

1 Answers

OK, I somehow misunderstood your question. So bye bye Fedorov, hello applied Fedorov.

The following algorithm is based on the second iteration of the Fedorov algorithm :

calculate all possible permutation for every set, and store them in the C0 list
draw a first possible solution from the C0 space (one permutation for every set). This can be the original one, but as I need the indices, I rather just start at random.
calculate the score for every new solution, where the first set is replaced by all permutations.
replace the first set with the permutation giving the lowest score
repeat 3-4 for every other set
repeat 3-5 until score reaches 0 or for n iterations.

Optionally, you can restart the procedure after 10 iterations and start from another starting point. In you test case, it turned out that few starting points converged very slowly to 0. The function below found balanced experimental designs with a score of 0 in on average 1.5 seconds on my computer :

> X <- findOptimalDesign(df)
> balanceScore(balanceMatrix(X))
[1] 0
> mean(replicate(20, system.time(X <- findOptimalDesign(df))[3]))
[1] 1.733

So this is the function now (given your original balanceMatrix and balanceScore functions) :

findOptimalDesign <- function(x,iter=4,restart=T){
    stopifnot(require(combinat))
    # transform rows to list
    sets <- unlist(apply(x,1,list),recursive=F)
    nsets <- NROW(x)
    # C0 contains all possible design points
    C0 <- lapply(sets,permn)
    n <- gamma(NCOL(x)+1)

    # starting point
    id <- sample(1:n,nsets)
    Sol <- sapply(1:nsets,function(i)C0[[i]][id[i]])

    IT <- iter
    # other iterations
    while(IT > 0){
      for(i in 1:nsets){
          nn <- 1:n
          scores <- sapply(nn,function(p){
             tmp <- Sol
             tmp[[i]] <- C0[[i]][[p]]
             w <- balanceMatrix(do.call(rbind,tmp))
             balanceScore(w)
          })
          idnew <- nn[which.min(scores)]
          Sol[[i]] <- C0[[i]][[idnew]]

      }
      #Check if score is 0
      out <- as.data.frame(do.call(rbind,Sol))
      score <- balanceScore(balanceMatrix(out))
      if (score==0) {break}
      IT <- IT - 1

      # If asked, restart
      if(IT==0 & restart){
          id <- sample(1:n,nsets)
          Sol <- sapply(1:nsets,function(i)C0[[i]][id[i]])
          IT <- iter
      }
    }
    out
}

HTH

EDIT : fixed small bug (it restarted immediately after every round as I forgot to condition on IT). Doing that, it runs a bit faster still.

174

answered Sep 27 '22 16:09

Joris Meys

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Randomizing balanced experimental designs

Tags:

r

mathematical-optimization

Andrie

People also ask

1 Answers

Joris Meys

Recent Activity

Donate For Us

Randomizing balanced experimental designs

Tags:

r

mathematical-optimization

Andrie

People also ask

1 Answers

Joris Meys

Related questions

Recent Activity

Donate For Us