Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fisher Yates variation

The classic Fisher Yates looks something like this:

void shuffle1(std::vector<int>& vec)
{
    int n = vec.size();
    for (int i = n - 1; i > 0; --i)
    {
        std::swap(vec[i], vec[rand() % (i + 1)]);
    }
}

Yesterday, I implemented the iteration "backwards" by mistake:

void shuffle2(std::vector<int>& vec)
{
    int n = vec.size();
    for (int i = 1; i < n; ++i)
    {
        std::swap(vec[i], vec[rand() % (i + 1)]);
    }
}

Is this version in any way worse (or better) than the first? Does it skew the resulting probabilities?

like image 893
fredoverflow Avatar asked Jan 14 '12 10:01

fredoverflow


2 Answers

Yes it's even distribution assuming rand() is. We will prove this by showing that each input can generate each permutation with equal probability.

N=2 can be easily proven. We will draw it as a tree where the the children represent each string you can get by inserting the character after comma into the leftmost string.

  0,1   //input where 0,1 represent indices
01  10  //output. Represents permutations of 01. It is clear that each one has equal probability

For N, we will have every permutations for N-1, and randomly swapping the last character for N

    (N-1 0th permutation),N     .....          (N-1 Ith permutation),N ________________________  
      /              \                       /                   \                             \ 
0th permutation of N  1st permutation....   (I*N)th permutation   ((I*N)+1)th permutation .... (I*N)+(I-1)th permutation

This shitty induction should lead you to it having even distribution.


Example:

N=2:

  0,1
01  10 // these are the permutations. Each one has equal probability

N=3:

           0,1|2           // the | is used to separate characters that we will insert later
    01,2           10,2    // 01, 10 are permutations from N-1, 2 is the new value
 210 021 012   201 120 102 // these are the permutations, still equal probability

N=4: (curved to aid reading)

                                                           0,1|23

                                                       01,2|3  10,2|3

                                           012,3 021,3 210,3    102,3 120,3 201,3

0123 0132 0321 3230                                                                                  2013 2031 2310 3012
                    0213 0231 0312 3210                                          1203 1230 1302 3201
                                        2103 2130 2301 3102  1023 1032 1320 3021

etc

like image 72
Pubby Avatar answered Sep 25 '22 12:09

Pubby


Looks OK to me (assuming rand() % N is unbiased, which it isn't). It seems it should be possible to demonstrate that every permutation of input is produced by exactly 1 sequence of random choices, where each random choice is balanced.

Compare this with a faulty implementation, such as

for (int i = 0; i < v.size(); ++i) {
  swap(v[i], v[rand() % v.size()]);
}

Here you can see that there are nn equally likely ways to produce n! permutations, and since nn is not evenly divisible by n! where n > 2, some of those permutations have to be produced more often than others.

like image 23
UncleBens Avatar answered Sep 24 '22 12:09

UncleBens