Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does this algorithm produce uniformly-random permuations?

I am studying CLRS and found a problem on shuffling algorithm. Does this produce a uniformly random permutations?

1 PERMUTE-WITH-ALL-IDENTITY(A)
2    n = A.length
3    for i = 1 to n
4        swap A[i] with A[RANDOM(1,n)]
5        swap A[i] with A[RANDOM(i+1,n)] 

My claim: No. it does not. Because, there will be n^n possible permutations due to line 4. And, it is not divisible by n! which is number of distinct permutations.

Can you, please, confirm if my reasoning is correct?

like image 581
Bek Abdik Avatar asked May 11 '15 14:05

Bek Abdik


2 Answers

For starters, I think there's a bug in the pseudocode. In line 4, I believe that there's a bounds error when i = n, since that would ask for a random number between n+1 and n. In what follows, I've corrected this by assuming the code is the following:

1 PERMUTE-WITH-ALL-IDENTITY(A)
2    n = A.length
3    for i = 1 to n
4        swap A[i] with A[RANDOM(1,n)]
5        swap A[i] with A[RANDOM(i,n)] 

If this is the code, then I believe the answer is no, this does not produce uniformly-random permutations. I don't have a mathematical proof that this is the case, but I do have a piece of C++ code that brute-forces all possible paths through PERMUTE-WITH-ALL-IDENTITY and counts the number of times that each permutation is produced. I ran that code and tested the algorithm on permutations on sequences of length 0 through 4, inclusive, and found that not all permutations are equally likely.

Here's the code:

#include <iostream>
#include <map>
#include <vector>
#include <algorithm>
using namespace std;

/* Maximum size of a sequence to permute. */
const size_t kMaxSize = 4;

/**
 * Given a frequencies map associating permutations to the number of times
 * that we've seen them, displays a visual report of the permutations
 * and their frequencies.
 *
 * @param frequencies The frequencies map.
 */
void reportResults(const map<vector<int>, size_t>& frequencies, size_t size) {
  cout << "Report for size " << size << endl;
  cout << "===================================================" << endl;  

  /* Print out the map. */
  cout << "  Map entries:" << endl;
  for (const auto& entry: frequencies) {
    cout << "    ";
    for (const auto& num: entry.first) {
      cout << num;
    }
    cout << ": " << entry.second << endl;
  }

  cout << "===================================================" << endl;
  cout << endl << endl;
}

/**
 * Traces through all possible executions of the algorithm, recording
 * the number of times that each outcome occurs. This algorithm uses
 * exhaustive recursion to explore the full space, assuming that the
 * underlying random generator is uniform.
 *
 * @param elems The elements in the sequence. It's assumed that initially
 *    these are in sorted order, but as the algorithm progresses it will
 *    become progressively more permuted.
 * @param frequencies A map from permutations to the number of times that
 *    they appear.
 * @param index The index through the main loop that we are currently in.
 *    This is the variable 'i' in the pseudocode.
 * @param size The length of the vector. This isn't strictly necessary,
 *    but it makes the code a bit cleaner.
 */
void recPopulate(const vector<int>& elems,
                 map<vector<int>, size_t>& frequencies,
                 size_t index, size_t size) {
  /* If we've finished permuting everything, record in the frequency map
   * that we've seen this permutation one more time.
   */
  if (index == size) {
    frequencies[elems]++;
  } else {
    /* For all possible pairs of a first swap and a second swap, try that
     * out and see what happens.
     */
    for (size_t i = 0; i < size; i++) {        // First swap index
      for (size_t j = index; j < size; j++) {  // Second swap index
        /* Make a clean copy of the vector to permute. */
        vector<int> newElems = elems;

        /* Perform the swaps. */
        swap(newElems[i], newElems[index]);
        swap(newElems[j], newElems[index]);

        /* Recursively explore all possible executions of the algorithm
         * from this point forward.
         */
        recPopulate(newElems, frequencies, index + 1, size);
      }
    }
  }
}

/**
 * Traces through all possible executions of the proposed algorithm,
 * building a frequency map associating each permutation with the
 * number of possible executions that arrive there.
 *
 * @param size The number of elements in the initial sequence.
 * @return A frequency map from permutations to the number of times that
 *    permutation can be generated.
 */
map<vector<int>, size_t> populateFrequencies(size_t size) {
  /* Create the sequence 0, 1, 2, ..., size - 1 */
  vector<int> elems(size);
  iota(elems.begin(), elems.end(), 0);

  map<vector<int>, size_t> frequencies;
  recPopulate(elems, frequencies, 0, elems.size());
  return frequencies;
}

int main() {
  for (size_t size = 0; size <= kMaxSize; size++) {
    map<vector<int>, size_t> frequencies = populateFrequencies(size);
    reportResults(frequencies, size);
  }
}

This program generates reports that show, for each permutation, the number of possible execution paths through the algorithm that produce that permutation. The output for permutations of four elements is shown below. Given that each execution path is equally likely, since the numbers here aren't the same, we can conclude that the algorithm is not uniformly random:

Report for size 4
===================================================
  Map entries:
    0123: 214
    0132: 268
    0213: 267
    0231: 316
    0312: 242
    0321: 229
    1023: 268
    1032: 322
    1203: 312
    1230: 349
    1302: 287
    1320: 262
    2013: 242
    2031: 283
    2103: 233
    2130: 262
    2301: 252
    2310: 240
    3012: 213
    3021: 208
    3102: 204
    3120: 187
    3201: 248
    3210: 236
===================================================

The above analysis hinges on the fact that

  1. My code is correct, and
  2. My interpretation of the pseudocode is correct.

If either of these are wrong, I'd be happy to retract or edit this answer. Please let me know if I've made a mistake here!

Hope this helps!

like image 102
templatetypedef Avatar answered Sep 17 '22 07:09

templatetypedef


The analysis above suggests a chi of 147 with 23 degrees of freedom, which means that the P-Value is < 0.00001. This indicates a very poor fit with a presumed uniform distribution.

But.

There only seems to be 6144 total samples. If you're looking at randomness, I would have thought that more runs would be appropriate. It could be that the P-Value moves towards a more favourable position after a 1000ish runs. Don't reseed the random generator between runs though.

like image 25
Paul Uszak Avatar answered Sep 20 '22 07:09

Paul Uszak