Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting algorithm to keep equal values separated

Psychology experiments often require you to pseudo-randomize the trial order, so that the trials are apparently random, but you don't get too many similar trials consecutively (which could happen with a purely random ordering).

Let's say that the visual display on each trial has a colour and a size:

display_list = []
colours = {0: 'red', 1: 'blue', 2: 'green', 3: 'yellow'}
sizes = [1] * 20 + [2] * 20 + [3] * 20 + [4] * 20 + [5] * 20 + [6] * 20
for i in range(120):
    display_list.append({'colour': colours[i % 4], 'size': sizes[i]})
print(display_list)

And we can look at the maximum number of consecutive trials that has the same value for either property using this function:

def consecutive_properties(seq, field):
    longest_run = 0
    prev_value = None
    current_run = 0
    for d in seq:
        if d[field] == prev_value:
            current_run += 1
        else:
            current_run = 1
        if current_run > longest_run:
            longest_run = current_run
        prev_value = d[field]
    return longest_run

Output:

>>> print("Consecutive colours: ", consecutive_properties(display_list, 'colour')
('Consecutive colours: ', 1)

>>> print("Consecutive sizes: ", consecutive_properties(display_list, 'size'))
('Consecutive sizes: ', 20)

Are there any algorithms you know of that would allow minimizing the consecutive runs of either or both properties, or at least keep these runs below a specified length? If the latter, let's say no more than 4 in a row of the same colour or size.


What I've tried:

The solution I have now basically does a slightly intelligent bogosort, which has to be horribly inefficient. Basically:

  • You break the entire list into chunks containing all the permutations of the properties: if you break down display_list into chunks of length 24, each chunk has each colour paired with each size. Let's assume that the trial list can always be broken down into these permutation chunks, since you know what the permutations are from the design of the experiment.
  • You choose a maximum run length per chunk
  • You shuffle each chunk until the run lengths for each chunk are below the maximum value (this actually means that in the overall trial list, your runs might be double that length, since you could have a run of this length at the end of one chunk and the start of the next)
like image 831
Marius Avatar asked May 27 '13 05:05

Marius


2 Answers

Question: Are there any algorithms you know of that would allow minimizing the consecutive runs of either or both properties, or at least keep these runs below a specified length?

Yes. There is an easy algorithm for doing this by simply reducing the probability of a color or size being chosen if it is already occurring in a run.

from random import randrange

def choose(colors, numselections, maxrun):
    'Repeatedly choose colors.  Gradually reduce selection probability to avoid runs.'
    colors = list(colors)
    n = len(colors)
    total = n * maxrun
    current_run = 0
    for _ in range(numselections):
        i = randrange(total - current_run) // maxrun
        yield colors[i]
        colors[i], colors[-1] = colors[-1], colors[i]
        current_run = current_run + 1 if i==n-1 else 1

if __name__ == '__main__':
    colors = ['red', 'blue', 'green', 'yellow']
    for color in choose(colors, 100, maxrun=4):
        print color

Note, this approach requires less effort than the other answers which use reselection techniques to avoid runs. Also, note the runs are faded-out gradually rather than all at once as in the other answers.

like image 93
Raymond Hettinger Avatar answered Oct 07 '22 01:10

Raymond Hettinger


You're clearly not concerned with anything like true randomness, so if you define a distance metric, and draw your sequence randomly, you can reject any new draw if it's distance is "too close" to the previous draw, and simply draw again.

If you're drawing from a finite set (say, a pack of cards) then the whole set can be the draw pile, and your sort would consist of swapping two elements when a close pair is found, but also reject a swap partner if the swapped element would become unacceptable, so each swap step leaves the whole set improved.

If your criteria are not too hard to satisfy, this will terminate very quickly.

like image 40
ddyer Avatar answered Oct 07 '22 01:10

ddyer