I am having trouble figuring out a decent way of randomly shuffling the elements in an std::vector
and, after some operations, restoring the original order. I know that this should be a rather trivial algorithm, but I guess I'm too tired...
Since I am constrained to use a custom random number generator class, I guess I can't use std::random_shuffle
, which doesn't help anyway, because I also need to preserve the original order. So, my approach was to create an std::map
which serves as a mapping between the original positions and the random ones, like this:
std::map<unsigned int, unsigned int> getRandomPermutation (const unsigned int &numberOfElements)
{
std::map<unsigned int, unsigned int> permutation;
//populate the map
for (unsigned int i = 0; i < numberOfElements; i++)
{
permutation[i] = i;
}
//randomize it
for (unsigned int i = 0; i < numberOfElements; i++)
{
//generate a random number in the interval [0, numberOfElements)
unsigned long randomValue = GetRandomInteger(numberOfElements - 1U);
//broken swap implementation
//permutation[i] = randomValue;
//permutation[randomValue] = i;
//use this instead:
std::swap(permutation[i], permutation[randomValue]);
}
return permutation;
}
I am not sure that the above algorithm is a proper implementation for a random permutation, so any improvements are welcome.
Now, here is how I've managed to make use of this permutation map:
std::vector<BigInteger> doStuff (const std::vector<BigInteger> &input)
{
/// Permute the values in a random order
std::map<unsigned int, unsigned int> permutation = getRandomPermutation(static_cast<unsigned int>(input.size()));
std::vector<BigInteger> temp;
//permute values
for (unsigned int i = 0; i < static_cast<unsigned int>(input.size()); ++i)
{
temp.push_back(input[permutation[i]]);
}
//do all sorts of stuff with temp
/// Reverse the permutation
std::vector<BigInteger> output;
for (unsigned int i = 0; i < static_cast<unsigned int>(input.size()); ++i)
{
output.push_back(temp[permutation[i]]);
}
return output;
}
Something tells me that I should be able to use only one std::vector<BigInteger>
for this algorithm, but, right now, I just can't figure out the optimal solution. Honestly, I don't really care about the data in input
, so I could even make it non-const, overwrite it, and skip creating a copy of it, but the question is how to implement the algorithm?
If I do something like this, I end up shooting myself in the foot, right? :)
for (unsigned int i = 0; i < static_cast<unsigned int>(input.size()); ++i)
{
BigInteger aux = input[i];
input[i] = input[permutation[i]];
input[permutation[i]] = aux;
}
EDIT: Following Steve's remark about using "Fisher-Yates" shuffle, I changed my getRandomPermutation
function accordingly:
std::map<unsigned int, unsigned int> getRandomPermutation (const unsigned int &numberOfElements)
{
std::map<unsigned int, unsigned int> permutation;
//populate the map
for (unsigned int i = 0; i < numberOfElements; i++)
{
permutation[i] = i;
}
//randomize it
for (unsigned int i = numberOfElements - 1; i > 0; --i)
{
//generate a random number in the interval [0, numberOfElements)
unsigned long randomValue = GetRandomInteger(i);
std::swap(permutation[i], permutation[randomValue]);
}
return permutation;
}
Def: A uniform random permutation is one in which each of the n! possible permutations are equally likely.
In cryptography, a pseudorandom permutation (PRP) is a function that cannot be distinguished from a random permutation (that is, a permutation selected at random with uniform probability, from the family of all permutations on the function's domain) with practical effort.
Random Permutations of Elements A permutation refers to an arrangement of elements. e.g. [3, 2, 1] is a permutation of [1, 2, 3] and vice-versa. The NumPy Random module provides two methods for this: shuffle() and permutation() .
numpy.random. permutation (x) Randomly permute a sequence, or return a permuted range. If x is a multi-dimensional array, it is only shuffled along its first index.
If you're "randomising" a vector of n elements, you can create another std::vector<size_t> index(n)
, set index[x] = x
for 0 <= x < n
, then shuffle index
. Then your lookups take the form: original_vector[index[i]]
. The order of the original vector's never changed so no need to restore ordering.
...constrained to use a custom random number generator class, I guess I can't use
std::random_shuffle
...
Have you noticed this overload?
template <class RandomAccessIterator, class RandomNumberGenerator>
void random_shuffle ( RandomAccessIterator first, RandomAccessIterator last,
RandomNumberGenerator& rand );
For details of how to wrap your random number generator with a compatible object, see http://www.sgi.com/tech/stl/RandomNumberGenerator.html
If you're looking for specific errors in your code:
permutation[i] = randomValue;
permutation[randomValue] = i;
is wrong. Observe that once you're finished, each value does not necessarily appear exactly once among the values of the map. So it's not a permutation, let alone a uniformly-distributed random one.
The proper means to generate a random permutation is what Tony says, use std::random_shuffle
on a vector that initially represents the identity permutation. Or if you want to know how a shuffle is properly performed, look up "Fisher-Yates". In general, any approach that makes N
random selections uniformly from 0 .. N-1
is doomed to failure, because that means it has N^N
possible ways it can run. But there are N!
possible permutations of N
items, and N^N
is generally not divisible by N!
. Hence it's impossible for each permutation to be the result of an equal number of random selections, i.e. the distribution is not uniform.
the question is how to implement the algorithm?
So, you have your permutation, and you want to re-order the elements of input
in-place, according to that permutation.
The key thing to know is that every permutation is a composition of "cycles". That is to say, if you repeatedly follow the permutation from a given starting point, you come back to where you started (and this path is the cycle to which that starting point belongs). There may be more than one such cycle in a given permutation, and if permutation[i] == i
for some i
, then the cycle of i
has length 1.
The cycles are all disjoint, that is to say each element appears in exactly one cycle. Because cycles don't "interfere" with each other, we can apply a permutation by applying each cycle, and we can do the cycles in any order. So, for each index i
we need to:
i
. If so, move on to the next index.current = i
index[current]
with index[permutation[current]]
. So index[current]
is set to its correct value (the next element in the cycle), and its old value is "pushed" forward along the cycle.current
as "done"permutuation[current]
is i
, we've finished the cycle. So the first value of the cycle ends up in the spot formerly occupied by the last element of the cycle, which is right. Move on to the next index.current = permutation[current]
and go back to the swap step.Depending on the types involved, you can optimize around the swaps - it may be better to copy/move to a temporary variable and the start of each cycle, then do a copy/move instead of a swap at each step of the cycle, and finally copy/move the temporary to the end of the cycle.
Reversing the process is the same, but using the "inverse" of the permutation. The inverse inv
of a permutation perm
, is the permutation such that inv[perm[i]] == i
for each i
. You can either compute the inverse and use the exact code above, or you can use code similar to the above, except move the elements in the opposite direction along each cycle.
An alternative to all that, since you implemented Fisher-Yates yourself -- as you're running Fisher-Yates, for each swap you perform record the two indices swapped in a vector<pair<size_t,size_t>>
. Then you don't have to worry about cycles. You can apply the permutation to the vector by applying the same sequence of swaps. You can reverse the permutation by applying the reversed sequence of swaps.
Note that, depending on your application, if it is important that you have a truly uniformly distributed permutation, you cannot use any algorithm that calls a typical pseudo-random number generator more that once.
The reason is that most pseudo-random number generators, such as the one in clib, are Linear congruential. Those have a weakness where they'll generate numbers that cluster in planes - so your permutations will not be perfectly uniformly distributed. Using a higher-quality generator should get around that.
See http://en.wikipedia.org/wiki/Linear_congruential_generator
Alternatively, you could just generate a single random number in the range 0..(n!-1) and pass it to the unrank function for permutations. For small enough n, you can store those and get a constant time algorithm, but if n is too large for that, the best unrank function is O(n). Applying the resulting permutation is going to be O(n) anyway.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With