Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this shuffling algorithm wrong?

Before I read about Fisher-Yates, this is the algorithm I came up with:

def sort(arr):
    for i in range(len(arr)):
        swap(arr, i, rand.randint(0, len(arr) - 1))

From my understanding, the only difference between this and Fisher-Yates is that instead of:

swap(arr, i, rand.randint(0, len(arr) - 1))

I should write:

swap(arr, i, rand.randint(i, len(arr) - 1))

Could someone explain how the first algorithm is incorrect? (ie. does not produce a random shuffle).

like image 529
aerain Avatar asked Aug 24 '12 03:08

aerain


1 Answers

From Wikipedia:

Similarly, always selecting j from the entire range of valid array indices on every iteration also produces a result which is biased, albeit less obviously so. This can be seen from the fact that doing so yields nn distinct possible sequences of swaps, whereas there are only n! possible permutations of an n-element array. Since nn can never be evenly divisible by n! when n > 2 (as the latter is divisible by n−1, which shares no prime factors with n), some permutations must be produced by more of the nn sequences of swaps than others. As a concrete example of this bias, observe the distribution of possible outcomes of shuffling a three-element array [1, 2, 3]. There are 6 possible permutations of this array (3! = 6), but the algorithm produces 27 possible shuffles (33 = 27). In this case, [1, 2, 3], [3, 1, 2], and [3, 2, 1] each result from 4 of the 27 shuffles, while each of the remaining 3 permutations occurs in 5 of the 27 shuffles.

Essentially, you are introducing a subtle bias into the shuffle, which will cause some permutations to crop up a bit more often than others. It's often not very noticeable, but it could make some sensitive applications (e.g. Monte Carlo simulations on permutations) fail to produce accurate answers.

like image 81
nneonneo Avatar answered Oct 13 '22 18:10

nneonneo