I am studying CLRS and found a problem on shuffling algorithm. Does this produce a uniformly random permutations?
1 PERMUTE-WITH-ALL-IDENTITY(A)
2 n = A.length
3 for i = 1 to n
4 swap A[i] with A[RANDOM(1,n)]
5 swap A[i] with A[RANDOM(i+1,n)]
My claim: No. it does not. Because, there will be n^n possible permutations due to line 4. And, it is not divisible by n! which is number of distinct permutations.
Can you, please, confirm if my reasoning is correct?
For starters, I think there's a bug in the pseudocode. In line 4, I believe that there's a bounds error when i = n, since that would ask for a random number between n+1 and n. In what follows, I've corrected this by assuming the code is the following:
1 PERMUTE-WITH-ALL-IDENTITY(A)
2 n = A.length
3 for i = 1 to n
4 swap A[i] with A[RANDOM(1,n)]
5 swap A[i] with A[RANDOM(i,n)]
If this is the code, then I believe the answer is no, this does not produce uniformly-random permutations. I don't have a mathematical proof that this is the case, but I do have a piece of C++ code that brute-forces all possible paths through PERMUTE-WITH-ALL-IDENTITY
and counts the number of times that each permutation is produced. I ran that code and tested the algorithm on permutations on sequences of length 0 through 4, inclusive, and found that not all permutations are equally likely.
Here's the code:
#include <iostream>
#include <map>
#include <vector>
#include <algorithm>
using namespace std;
/* Maximum size of a sequence to permute. */
const size_t kMaxSize = 4;
/**
* Given a frequencies map associating permutations to the number of times
* that we've seen them, displays a visual report of the permutations
* and their frequencies.
*
* @param frequencies The frequencies map.
*/
void reportResults(const map<vector<int>, size_t>& frequencies, size_t size) {
cout << "Report for size " << size << endl;
cout << "===================================================" << endl;
/* Print out the map. */
cout << " Map entries:" << endl;
for (const auto& entry: frequencies) {
cout << " ";
for (const auto& num: entry.first) {
cout << num;
}
cout << ": " << entry.second << endl;
}
cout << "===================================================" << endl;
cout << endl << endl;
}
/**
* Traces through all possible executions of the algorithm, recording
* the number of times that each outcome occurs. This algorithm uses
* exhaustive recursion to explore the full space, assuming that the
* underlying random generator is uniform.
*
* @param elems The elements in the sequence. It's assumed that initially
* these are in sorted order, but as the algorithm progresses it will
* become progressively more permuted.
* @param frequencies A map from permutations to the number of times that
* they appear.
* @param index The index through the main loop that we are currently in.
* This is the variable 'i' in the pseudocode.
* @param size The length of the vector. This isn't strictly necessary,
* but it makes the code a bit cleaner.
*/
void recPopulate(const vector<int>& elems,
map<vector<int>, size_t>& frequencies,
size_t index, size_t size) {
/* If we've finished permuting everything, record in the frequency map
* that we've seen this permutation one more time.
*/
if (index == size) {
frequencies[elems]++;
} else {
/* For all possible pairs of a first swap and a second swap, try that
* out and see what happens.
*/
for (size_t i = 0; i < size; i++) { // First swap index
for (size_t j = index; j < size; j++) { // Second swap index
/* Make a clean copy of the vector to permute. */
vector<int> newElems = elems;
/* Perform the swaps. */
swap(newElems[i], newElems[index]);
swap(newElems[j], newElems[index]);
/* Recursively explore all possible executions of the algorithm
* from this point forward.
*/
recPopulate(newElems, frequencies, index + 1, size);
}
}
}
}
/**
* Traces through all possible executions of the proposed algorithm,
* building a frequency map associating each permutation with the
* number of possible executions that arrive there.
*
* @param size The number of elements in the initial sequence.
* @return A frequency map from permutations to the number of times that
* permutation can be generated.
*/
map<vector<int>, size_t> populateFrequencies(size_t size) {
/* Create the sequence 0, 1, 2, ..., size - 1 */
vector<int> elems(size);
iota(elems.begin(), elems.end(), 0);
map<vector<int>, size_t> frequencies;
recPopulate(elems, frequencies, 0, elems.size());
return frequencies;
}
int main() {
for (size_t size = 0; size <= kMaxSize; size++) {
map<vector<int>, size_t> frequencies = populateFrequencies(size);
reportResults(frequencies, size);
}
}
This program generates reports that show, for each permutation, the number of possible execution paths through the algorithm that produce that permutation. The output for permutations of four elements is shown below. Given that each execution path is equally likely, since the numbers here aren't the same, we can conclude that the algorithm is not uniformly random:
Report for size 4
===================================================
Map entries:
0123: 214
0132: 268
0213: 267
0231: 316
0312: 242
0321: 229
1023: 268
1032: 322
1203: 312
1230: 349
1302: 287
1320: 262
2013: 242
2031: 283
2103: 233
2130: 262
2301: 252
2310: 240
3012: 213
3021: 208
3102: 204
3120: 187
3201: 248
3210: 236
===================================================
The above analysis hinges on the fact that
If either of these are wrong, I'd be happy to retract or edit this answer. Please let me know if I've made a mistake here!
Hope this helps!
The analysis above suggests a chi of 147 with 23 degrees of freedom, which means that the P-Value is < 0.00001. This indicates a very poor fit with a presumed uniform distribution.
But.
There only seems to be 6144 total samples. If you're looking at randomness, I would have thought that more runs would be appropriate. It could be that the P-Value moves towards a more favourable position after a 1000ish runs. Don't reseed the random generator between runs though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With