Just as background, I'm aware of the Fisher-Yates perfect shuffle. It is a great shuffle with its O(n) complexity and its guaranteed uniformity and I'd be a fool not to use it ... in an environment that permits in-place updates of arrays (so in most, if not all, imperative programming environments). Sadly the functional programming world doesn't give you access to mutable state. Because of Fisher-Yates, however, there's not a lot of literature I can find on how to design a shuffling algorithm. The few places that address it at all do so briefly before saying, in effect, "so here's Fisher-Yates which is all the shuffling you need to know". I had to, in the end, come up with my own solution. The solution I came up with works like this to shuffle any list of data: <ul> <li>If the list is empty, return the empty set.</li> <li>If the list has a single item, return that single item.</li> <li>If the list is non-empty, partition the list with a random number generator and apply the algorithm recursively to each partition, assembling the results.</li> </ul> In Erlang code it looks something like this: <pre class="prettyprint"><code>shuffle([]) -> []; shuffle([L]) -> [L]; shuffle(L) -> {Left, Right} = lists:partition(fun(_) -> random:uniform() < 0.5 end, L), shuffle(Left) ++ shuffle(Right). </code></pre> (If this looks like a deranged quick sort to you, well, that's what it is, basically.) So here's my problem: the same situation that makes finding shuffling algorithms that aren't Fisher-Yates difficult makes finding tools to analyse a shuffling algorithm equally difficult. There's lots of literature I can find on analysing PRNGs for uniformity, periodicity, etc. but not a lot of information out there on how to analyse a shuffle. (Indeed some of the information I found on analysing shuffles was just plain wrong -- easily deceived through simple techniques.) So my question is this: how do I analyse my shuffling algorithm (assuming that the <code>random:uniform()</code> call up there is up to the task of generating apropriate random numbers with good characteristics)? What mathematical tools are there at my disposal to judge whether or not, say, 100,000 runs of the shuffler over a list of integers ranging 1..100 has given me plausibly good shuffling results? I've done a few tests of my own (comparing increments to decrements in the shuffles, for example), but I'd like to know a few more. And if there's any insight into that shuffle algorithm itself that would be appreciated too.

<h3>General remark</h3> My personal approach about correctness of probability-using algorithms: if you know how to prove it's correct, then it's probably correct; if you don't, it's certainly wrong. Said differently, it's generally hopeless to try to analyse every algorithm you could come up with: you have to keep looking for an algorithm until you find one that you can prove correct. <h3>Analysing a random algorithm by computing the distribution</h3> I know of one way to "automatically" analyse a shuffle (or more generally a random-using algorithm) that is stronger than the simple "throw lots of tests and check for uniformity". You can mechanically compute the distribution associated to each input of your algorithm. The general idea is that a random-using algorithm explores a part of a world of possibilities. Each time your algorithm asks for a random element in a set ({<code>true</code>, <code>false</code>} when flipping a coin), there are two possible outcomes for your algorithm, and one of them is chosen. You can change your algorithm so that, instead of returning one of the possible outcomes, it explores all solutions in parallel and returns all possible outcomes with the associated distributions. In general, that would require rewriting your algorithm in depth. If your language supports delimited continuations, you don't have to; you can implement "exploration of all possible outcomes" inside the function asking for a random element (the idea is that the random generator, instead of returning a result, capture the continuation associated to your program and run it with all different results). For an example of this approach, see oleg's HANSEI. An intermediary, and probably less arcane, solution is to represent this "world of possible outcomes" as a monad, and use a language such as Haskell with facilities for monadic programming. Here is an example implementation of a variant¹ of your algorithm, in Haskell, using the probability monad of the probability package : <pre class="prettyprint"><code>import Numeric.Probability.Distribution shuffleM :: (Num prob, Fractional prob) => [a] -> T prob [a] shuffleM [] = return [] shuffleM [x] = return [x] shuffleM (pivot:li) = do (left, right) <- partition li sleft <- shuffleM left sright <- shuffleM right return (sleft ++ [pivot] ++ sright) where partition [] = return ([], []) partition (x:xs) = do (left, right) <- partition xs uniform [(x:left, right), (left, x:right)] </code></pre> You can run it for a given input, and get the output distribution : <pre class="prettyprint"><code>*Main> shuffleM [1,2] fromFreqs [([1,2],0.5),([2,1],0.5)] *Main> shuffleM [1,2,3] fromFreqs [([2,1,3],0.25),([3,1,2],0.25),([1,2,3],0.125), ([1,3,2],0.125),([2,3,1],0.125),([3,2,1],0.125)] </code></pre> You can see that this algorithm is uniform with inputs of size 2, but non-uniform on inputs of size 3. The difference with the test-based approach is that we can gain absolute certainty in a finite number of steps : it can be quite big, as it amounts to an exhaustive exploration of the world of possibles (but generally smaller than 2^N, as there are factorisations of similar outcomes), but if it returns a non-uniform distribution we know for sure that the algorithm is wrong. Of course, if it returns an uniform distribution for <code>[1..N]</code> and <code>1 <= N <= 100</code>, you only know that your algorithm is uniform up to lists of size 100; it may still be wrong. ¹: this algorithm is a variant of your Erlang's implementation, because of the specific pivot handling. If I use no pivot, like in your case, the input size doesn't decrease at each step anymore : the algorithm also considers the case where all inputs are in the left list (or right list), and get lost in an infinite loop. This is a weakness of the probability monad implementation (if an algorithm has a probability 0 of non-termination, the distribution computation may still diverge), that I don't yet know how to fix. <h3>Sort-based shuffles</h3> Here is a simple algorithm that I feel confident I could prove correct: <ol> <li>Pick a random key for each element in your collection.</li> <li>If the keys are not all distinct, restart from step 1.</li> <li>Sort the collection by these random keys.</li> </ol> You can omit step 2 if you know the probability of a collision (two random numbers picked are equal) is sufficiently low, but without it the shuffle is not perfectly uniform. If you pick your keys in [1..N] where N is the length of your collection, you'll have lots of collisions (Birthday problem). If you pick your key as a 32-bit integer, the probability of conflict is low in practice, but still subject to the birthday problem. If you use infinite (lazily evaluated) bitstrings as keys, rather than finite-length keys, the probability of a collision becomes 0, and checking for distinctness is no longer necessary. Here is a shuffle implementation in OCaml, using lazy real numbers as infinite bitstrings: <pre class="prettyprint"><code>type 'a stream = Cons of 'a * 'a stream lazy_t let rec real_number () = Cons (Random.bool (), lazy (real_number ())) let rec compare_real a b = match a, b with | Cons (true, _), Cons (false, _) -> 1 | Cons (false, _), Cons (true, _) -> -1 | Cons (_, lazy a'), Cons (_, lazy b') -> compare_real a' b' let shuffle list = List.map snd (List.sort (fun (ra, _) (rb, _) -> compare_real ra rb) (List.map (fun x -> real_number (), x) list)) </code></pre> There are other approaches to "pure shuffling". A nice one is apfelmus's mergesort-based solution. Algorithmic considerations: the complexity of the previous algorithm depends on the probability that all keys are distinct. If you pick them as 32-bit integers, you have a one in ~4 billion probability that a particular key collides with another key. Sorting by these keys is O(n log n), assuming picking a random number is O(1). If you infinite bitstrings, you never have to restart picking, but the complexity is then related to "how many elements of the streams are evaluated on average". I conjecture it is O(log n) in average (hence still O(n log n) in total), but have no proof. <h3>... and I think your algorithm works</h3> After more reflexion, I think (like douplep), that your implementation is correct. Here is an informal explanation. Each element in your list is tested by several <code>random:uniform() < 0.5</code> tests. To an element, you can associate the list of outcomes of those tests, as a list of booleans or {<code>0</code>, <code>1</code>}. At the beginning of the algorithm, you don't know the list associated to any of those number. After the first <code>partition</code> call, you know the first element of each list, etc. When your algorithm returns, the list of tests are completely known and the elements are sorted according to those lists (sorted in lexicographic order, or considered as binary representations of real numbers). So, your algorithm is equivalent to sorting by infinite bitstring keys. The action of partitioning the list, reminiscent of quicksort's partition over a pivot element, is actually a way of separating, for a given position in the bitstring, the elements with valuation <code>0</code> from the elements with valuation <code>1</code>. The sort is uniform because the bitstrings are all different. Indeed, two elements with real numbers equal up to the <code>n</code>-th bit are on the same side of a partition occurring during a recursive <code>shuffle</code> call of depth <code>n</code>. The algorithm only terminates when all the lists resulting from partitions are empty or singletons : all elements have been separated by at least one test, and therefore have one distinct binary decimal. <h3>Probabilistic termination</h3> A subtle point about your algorithm (or my equivalent sort-based method) is that the termination condition is probabilistic. Fisher-Yates always terminates after a known number of steps (the number of elements in the array). With your algorithm, the termination depends on the output of the random number generator. There are possible outputs that would make your algorithm diverge, not terminate. For example, if the random number generator always output <code>0</code>, each <code>partition</code> call will return the input list unchanged, on which you recursively call the shuffle : you will loop indefinitely. However, this is not an issue if you're confident that your random number generator is fair : it does not cheat and always return independent uniformly distributed results. In that case, the probability that the test <code>random:uniform() < 0.5</code> always returns <code>true</code> (or <code>false</code>) is exactly 0 : <ul> <li>the probability that the first N calls return <code>true</code> is 2^{-N}</li> <li>the probability that all calls return <code>true</code> is the probability of the infinite intersection, for all N, of the event that the first N calls return <code>0</code>; it is the infimum limit¹ of the 2^{-N}, which is 0</li> </ul> ¹: for the mathematical details, see http://en.wikipedia.org/wiki/Measure_(mathematics)#Measures_of_infinite_intersections_of_measurable_sets More generally, the algorithm does not terminate if and only if some of the elements get associated to the same boolean stream. This means that at least two elements have the same boolean stream. But the probability that two random boolean streams are equal is again 0 : the probability that the digits at position K are equal is 1/2, so the probability that the N first digits are equal is 2^{-N}, and the same analysis applies. Therefore, you know that your algorithm terminates with probability 1. This is a slightly weaker guarantee that the Fisher-Yates algorithm, which always terminate. In particular, you're vulnerable to an attack of an evil adversary that would control your random number generator. With more probability theory, you could also compute the distribution of running times of your algorithm for a given input length. This is beyond my technical abilities, but I assume it's good : I suppose that you only need to look at O(log N) first digits on average to check that all N lazy streams are different, and that the probability of much higher running times decrease exponentially.

What, if anything, is wrong with this shuffling algorithm and how can I know?

Tags:

algorithm

functional-programming

shuffle

Just as background, I'm aware of the Fisher-Yates perfect shuffle. It is a great shuffle with its O(n) complexity and its guaranteed uniformity and I'd be a fool not to use it ... in an environment that permits in-place updates of arrays (so in most, if not all, imperative programming environments).

Sadly the functional programming world doesn't give you access to mutable state.

Because of Fisher-Yates, however, there's not a lot of literature I can find on how to design a shuffling algorithm. The few places that address it at all do so briefly before saying, in effect, "so here's Fisher-Yates which is all the shuffling you need to know". I had to, in the end, come up with my own solution.

The solution I came up with works like this to shuffle any list of data:

If the list is empty, return the empty set.
If the list has a single item, return that single item.
If the list is non-empty, partition the list with a random number generator and apply the algorithm recursively to each partition, assembling the results.

In Erlang code it looks something like this:

shuffle([])  -> []; shuffle([L]) -> [L]; shuffle(L)   ->   {Left, Right} = lists:partition(fun(_) ->                                      random:uniform() < 0.5                                    end, L),   shuffle(Left) ++ shuffle(Right).

(If this looks like a deranged quick sort to you, well, that's what it is, basically.)

So here's my problem: the same situation that makes finding shuffling algorithms that aren't Fisher-Yates difficult makes finding tools to analyse a shuffling algorithm equally difficult. There's lots of literature I can find on analysing PRNGs for uniformity, periodicity, etc. but not a lot of information out there on how to analyse a shuffle. (Indeed some of the information I found on analysing shuffles was just plain wrong -- easily deceived through simple techniques.)

So my question is this: how do I analyse my shuffling algorithm (assuming that the random:uniform() call up there is up to the task of generating apropriate random numbers with good characteristics)? What mathematical tools are there at my disposal to judge whether or not, say, 100,000 runs of the shuffler over a list of integers ranging 1..100 has given me plausibly good shuffling results? I've done a few tests of my own (comparing increments to decrements in the shuffles, for example), but I'd like to know a few more.

And if there's any insight into that shuffle algorithm itself that would be appreciated too.

221

asked Oct 15 '10 17:10

JUST MY correct OPINION

1 Answers

General remark

My personal approach about correctness of probability-using algorithms: if you know how to prove it's correct, then it's probably correct; if you don't, it's certainly wrong.

Said differently, it's generally hopeless to try to analyse every algorithm you could come up with: you have to keep looking for an algorithm until you find one that you can prove correct.

Analysing a random algorithm by computing the distribution

I know of one way to "automatically" analyse a shuffle (or more generally a random-using algorithm) that is stronger than the simple "throw lots of tests and check for uniformity". You can mechanically compute the distribution associated to each input of your algorithm.

The general idea is that a random-using algorithm explores a part of a world of possibilities. Each time your algorithm asks for a random element in a set ({true, false} when flipping a coin), there are two possible outcomes for your algorithm, and one of them is chosen. You can change your algorithm so that, instead of returning one of the possible outcomes, it explores all solutions in parallel and returns all possible outcomes with the associated distributions.

In general, that would require rewriting your algorithm in depth. If your language supports delimited continuations, you don't have to; you can implement "exploration of all possible outcomes" inside the function asking for a random element (the idea is that the random generator, instead of returning a result, capture the continuation associated to your program and run it with all different results). For an example of this approach, see oleg's HANSEI.

An intermediary, and probably less arcane, solution is to represent this "world of possible outcomes" as a monad, and use a language such as Haskell with facilities for monadic programming. Here is an example implementation of a variant¹ of your algorithm, in Haskell, using the probability monad of the probability package :

import Numeric.Probability.Distribution  shuffleM :: (Num prob, Fractional prob) => [a] -> T prob [a] shuffleM [] = return [] shuffleM [x] = return [x] shuffleM (pivot:li) = do         (left, right) <- partition li         sleft <- shuffleM left         sright <- shuffleM right         return (sleft ++ [pivot] ++ sright)   where partition [] = return ([], [])         partition (x:xs) = do                   (left, right) <- partition xs                   uniform [(x:left, right), (left, x:right)]

You can run it for a given input, and get the output distribution :

*Main> shuffleM [1,2] fromFreqs [([1,2],0.5),([2,1],0.5)] *Main> shuffleM [1,2,3] fromFreqs   [([2,1,3],0.25),([3,1,2],0.25),([1,2,3],0.125),    ([1,3,2],0.125),([2,3,1],0.125),([3,2,1],0.125)]

You can see that this algorithm is uniform with inputs of size 2, but non-uniform on inputs of size 3.

The difference with the test-based approach is that we can gain absolute certainty in a finite number of steps : it can be quite big, as it amounts to an exhaustive exploration of the world of possibles (but generally smaller than 2^N, as there are factorisations of similar outcomes), but if it returns a non-uniform distribution we know for sure that the algorithm is wrong. Of course, if it returns an uniform distribution for [1..N] and 1 <= N <= 100, you only know that your algorithm is uniform up to lists of size 100; it may still be wrong.

¹: this algorithm is a variant of your Erlang's implementation, because of the specific pivot handling. If I use no pivot, like in your case, the input size doesn't decrease at each step anymore : the algorithm also considers the case where all inputs are in the left list (or right list), and get lost in an infinite loop. This is a weakness of the probability monad implementation (if an algorithm has a probability 0 of non-termination, the distribution computation may still diverge), that I don't yet know how to fix.

Sort-based shuffles

Here is a simple algorithm that I feel confident I could prove correct:

Pick a random key for each element in your collection.
If the keys are not all distinct, restart from step 1.
Sort the collection by these random keys.

You can omit step 2 if you know the probability of a collision (two random numbers picked are equal) is sufficiently low, but without it the shuffle is not perfectly uniform.

If you pick your keys in [1..N] where N is the length of your collection, you'll have lots of collisions (Birthday problem). If you pick your key as a 32-bit integer, the probability of conflict is low in practice, but still subject to the birthday problem.

If you use infinite (lazily evaluated) bitstrings as keys, rather than finite-length keys, the probability of a collision becomes 0, and checking for distinctness is no longer necessary.

Here is a shuffle implementation in OCaml, using lazy real numbers as infinite bitstrings:

type 'a stream = Cons of 'a * 'a stream lazy_t  let rec real_number () =   Cons (Random.bool (), lazy (real_number ()))  let rec compare_real a b = match a, b with | Cons (true, _), Cons (false, _) -> 1 | Cons (false, _), Cons (true, _) -> -1 | Cons (_, lazy a'), Cons (_, lazy b') ->     compare_real a' b'  let shuffle list =   List.map snd     (List.sort (fun (ra, _) (rb, _) -> compare_real ra rb)        (List.map (fun x -> real_number (), x) list))

There are other approaches to "pure shuffling". A nice one is apfelmus's mergesort-based solution.

Algorithmic considerations: the complexity of the previous algorithm depends on the probability that all keys are distinct. If you pick them as 32-bit integers, you have a one in ~4 billion probability that a particular key collides with another key. Sorting by these keys is O(n log n), assuming picking a random number is O(1).

If you infinite bitstrings, you never have to restart picking, but the complexity is then related to "how many elements of the streams are evaluated on average". I conjecture it is O(log n) in average (hence still O(n log n) in total), but have no proof.

... and I think your algorithm works

After more reflexion, I think (like douplep), that your implementation is correct. Here is an informal explanation.

Each element in your list is tested by several random:uniform() < 0.5 tests. To an element, you can associate the list of outcomes of those tests, as a list of booleans or {0, 1}. At the beginning of the algorithm, you don't know the list associated to any of those number. After the first partition call, you know the first element of each list, etc. When your algorithm returns, the list of tests are completely known and the elements are sorted according to those lists (sorted in lexicographic order, or considered as binary representations of real numbers).

So, your algorithm is equivalent to sorting by infinite bitstring keys. The action of partitioning the list, reminiscent of quicksort's partition over a pivot element, is actually a way of separating, for a given position in the bitstring, the elements with valuation 0 from the elements with valuation 1.

The sort is uniform because the bitstrings are all different. Indeed, two elements with real numbers equal up to the n-th bit are on the same side of a partition occurring during a recursive shuffle call of depth n. The algorithm only terminates when all the lists resulting from partitions are empty or singletons : all elements have been separated by at least one test, and therefore have one distinct binary decimal.

Probabilistic termination

A subtle point about your algorithm (or my equivalent sort-based method) is that the termination condition is probabilistic. Fisher-Yates always terminates after a known number of steps (the number of elements in the array). With your algorithm, the termination depends on the output of the random number generator.

There are possible outputs that would make your algorithm diverge, not terminate. For example, if the random number generator always output 0, each partition call will return the input list unchanged, on which you recursively call the shuffle : you will loop indefinitely.

However, this is not an issue if you're confident that your random number generator is fair : it does not cheat and always return independent uniformly distributed results. In that case, the probability that the test random:uniform() < 0.5 always returns true (or false) is exactly 0 :

the probability that the first N calls return true is 2^{-N}
the probability that all calls return true is the probability of the infinite intersection, for all N, of the event that the first N calls return 0; it is the infimum limit¹ of the 2^{-N}, which is 0

¹: for the mathematical details, see http://en.wikipedia.org/wiki/Measure_(mathematics)#Measures_of_infinite_intersections_of_measurable_sets

More generally, the algorithm does not terminate if and only if some of the elements get associated to the same boolean stream. This means that at least two elements have the same boolean stream. But the probability that two random boolean streams are equal is again 0 : the probability that the digits at position K are equal is 1/2, so the probability that the N first digits are equal is 2^{-N}, and the same analysis applies.

Therefore, you know that your algorithm terminates with probability 1. This is a slightly weaker guarantee that the Fisher-Yates algorithm, which always terminate. In particular, you're vulnerable to an attack of an evil adversary that would control your random number generator.

With more probability theory, you could also compute the distribution of running times of your algorithm for a given input length. This is beyond my technical abilities, but I assume it's good : I suppose that you only need to look at O(log N) first digits on average to check that all N lazy streams are different, and that the probability of much higher running times decrease exponentially.

answered Sep 22 '22 21:09

gasche

Related questions
                            
                                Select k random elements from a list whose elements have weights
                            
                                C How to "draw" a Binary Tree to the console [closed]
                            
                                Possible Interview Question: How to Find All Overlapping Intervals
                            
                                Which is faster, Hash lookup or Binary search?
                            
                                Test if a number is fibonacci
                            
                                Longest equally-spaced subsequence
                            
                                What's the difference between `git diff --patience` and `git diff --histogram`?
                            
                                Strategies for simplifying math expressions
                            
                                robust algorithm for surface reconstruction from 3D point cloud?
                            
                                Representing logic as data in JSON
                            
                                Difference between O(n) and O(log(n)) - which is better and what exactly is O(log(n))?
                            
                                Maximize the rectangular area under Histogram
                            
                                Most elegant way to change 0 to 1 and vice versa
                            
                                Algorithm for autocomplete?
                            
                                Weighted random selection from array
                            
                                Circular lock-free buffer
                            
                                Python: find closest string (from a list) to another string
                            
                                Writing your own square root function
                            
                                The "guess the number" game for arbitrary rational numbers?
                            
                                Smart progress bar ETA computation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With