<code>Collections.shuffle()</code> goes through each index of a <code>Collection</code> backwards and then swaps it with a random index including or before it. I was wondering why, so I tried doing the same thing but swapping with any random index in the <code>Collection</code>. Here is the shuffling part of the Collections.shuffle() code: <pre class="prettyprint"><code>for (int i=size; i>1; i--) swap(arr, i-1, rnd.nextInt(i)); </code></pre> And here is my algorithm: <pre class="prettyprint"><code>Random r = new Random(); for (int i = 0; i < a.size(); i++) { int index = r.nextInt(a.size()); int temp = a.get(i); a.set(i, a.get(index)); a.set(index, temp); } </code></pre> I found that <code>Collections.shuffle()</code> was much more evenly distributed than my code when I ran both on the same <code>ArrayList</code> one million times. Also, when running my code on: <blockquote> [0, 1, 2, 3, 4] </blockquote> it seems that the following permutations consistently occur most frequently: <blockquote> [1, 0, 3, 4, 2] [1, 2, 3, 4, 0] [1, 2, 0, 4, 3] [0, 2, 3, 4, 1] [1, 2, 3, 0, 4] </blockquote> Could someone please explain why?

This is another explanation of Fisher-Yates. Consider the following method: <ol> <li>There are two lists, A and B. Initially, all the elements are on list A so list B is empty.</li> <li> At each step: Pick with uniform probability from the elements currently on list A. Permute list A to make the chosen element the last element. Remove the last element from list A and append it to list B. </li> <li>When list A is empty, return list B.</li> </ol> I find this algorithm easy to understand and visualize. The probability of a given item being chosen on the first step is <code>1/n</code>. The probability of a given item being chosen on the second step is its probability of not being chosen on the first step, <code>(n-1)/n</code>, times its probability of being chosen on the second step given it is still on list A, <code>1/(n-1)</code>. That product is <code>1/n</code>. Similarly, it has probability <code>((n-1)/n)*((n-2)/(n-1)) = (n-2)/n</code> of still being on list A after two items have been moved, and therefore a <code>1/n</code> probability of being the third item chosen. In general, the probability of still being on list A after <code>k</code> items have been chosen is <code>((n-1)/n)*((n-2)/(n-1))*...*((n-k)/(n-k+1)) = (n-k)/n</code>. The probability of being chosen on the next step, given the item is still on list A, is <code>1/(n-k)</code>, so the unconditional probability being chosen on the step when list A has <code>(n-k)</code> items is <code>((n-k)/n)*(1/(n-k)) = 1/n</code>. Fisher-Yates is just this algorithm with the two lists, whose total length is always <code>n</code>, concatenated in a single array. At each step, it selects an element from list A with uniform probability, permutes list A to put that element adjacent to list B, and then moves the boundary so that it changes from being a list A element to being the most recently added element of list B.

Why does the Collections.shuffle() algorithm work better than my implementation [duplicate]

Tags:

java

shuffle

Collections.shuffle() goes through each index of a Collection backwards and then swaps it with a random index including or before it. I was wondering why, so I tried doing the same thing but swapping with any random index in the Collection.

Here is the shuffling part of the Collections.shuffle() code:

for (int i=size; i>1; i--)
    swap(arr, i-1, rnd.nextInt(i));

And here is my algorithm:

Random r = new Random();
for (int i = 0; i < a.size(); i++) {
    int index = r.nextInt(a.size());
    int temp = a.get(i);
    a.set(i, a.get(index));
    a.set(index, temp);
}

I found that Collections.shuffle() was much more evenly distributed than my code when I ran both on the same ArrayList one million times. Also, when running my code on:

[0, 1, 2, 3, 4]

it seems that the following permutations consistently occur most frequently:

[1, 0, 3, 4, 2]
[1, 2, 3, 4, 0]
[1, 2, 0, 4, 3]
[0, 2, 3, 4, 1]
[1, 2, 3, 0, 4]

Could someone please explain why?

933

asked Apr 06 '15 14:04

czhao

2 Answers

Collections.Shuffle() does a Fisher-Yates shuffle. It's a more evenly distributed form of shuffling, and does not reshuffle what might have previously been shuffled already, as opposed to your algorithm.

What your algorithm does (also the known as the naive implementation) is that it will randomly pick any array index and shuffle it over, meaning there's a high chance of it picking the same index that has previously been shuffled already.

The Fisher-Yates shuffle (also known as the Donald Knuth Shuffle) is an unbiased algorithm that shuffles items in the array in an equally likely probability. It avoids the chance if it 'moving' the same objects twice.

Here's a good explanation of the Fisher Yates Shuffle by our own Jeff Atwood at Coding Horror, he discusses the naive implementation of the random shuffle versus the Fisher Yates shuffle.

See also this SO question about Java's implementation. It mentions what you asked. You can also look at the source code if you'd like, as mentioned there. I found it by Googling Collections.shuffle() in the top 5 links.

To further discuss this, it's always a good idea to use the Fisher-Yates shuffle compared to the naive implementations, especially in implementations that require a higher level of randomness (such as shuffle poker cards) to avoid introducing odds and unfair play. It wouldn't be a good thing if games of chance where implemented based on our naive implementation, as the bias leads to what you've observed, where the same permutation seems to show up more often than the others.

Lastly, as user @jmruc mentioned, here's a very nice tutorial on visualizing algorithms, it contains the Fisher-Yates shuffle, as well as other algorithms, all beautifully presented. Might help you wrap your head around the concepts if you're more of a visualizer: Visualizing Algorithms by Mike Bostock

125

answered Sep 27 '22 18:09

matrixanomaly

This is another explanation of Fisher-Yates.

Consider the following method:

There are two lists, A and B. Initially, all the elements are on list A so list B is empty.
At each step:

Pick with uniform probability from the elements currently on list A.

Permute list A to make the chosen element the last element.

Remove the last element from list A and append it to list B.
When list A is empty, return list B.

I find this algorithm easy to understand and visualize.

The probability of a given item being chosen on the first step is 1/n. The probability of a given item being chosen on the second step is its probability of not being chosen on the first step, (n-1)/n, times its probability of being chosen on the second step given it is still on list A, 1/(n-1). That product is 1/n.

Similarly, it has probability ((n-1)/n)*((n-2)/(n-1)) = (n-2)/n of still being on list A after two items have been moved, and therefore a 1/n probability of being the third item chosen.

In general, the probability of still being on list A after k items have been chosen is ((n-1)/n)*((n-2)/(n-1))*...*((n-k)/(n-k+1)) = (n-k)/n. The probability of being chosen on the next step, given the item is still on list A, is 1/(n-k), so the unconditional probability being chosen on the step when list A has (n-k) items is ((n-k)/n)*(1/(n-k)) = 1/n.

Fisher-Yates is just this algorithm with the two lists, whose total length is always n, concatenated in a single array. At each step, it selects an element from list A with uniform probability, permutes list A to put that element adjacent to list B, and then moves the boundary so that it changes from being a list A element to being the most recently added element of list B.

answered Sep 27 '22 18:09

Patricia Shanahan

Related questions
                            
                                How to set JSpinner as non editable?
                            
                                Error "HttpServletRequest refers to the missing type String"
                            
                                Subclassing an enum
                            
                                Get two lower bytes from int variable
                            
                                JAXB binding file: XmlAdapters and package name
                            
                                How do I find all the Points in a Path in Android?
                            
                                How to create a custom validator in Play Framework 2.0?
                            
                                Keeping Code Collapsed - Eclipse IDE
                            
                                ViewPager Fragments getting destroyed over time?
                            
                                Java Returning method which returns arraylist?
                            
                                HashMap returned by Maps.newHashMap vs new HashMap
                            
                                Java : javap ERROR:Could not find .class
                            
                                Java - using AtomicInteger vs Static int
                            
                                mybatis IllegalArgumentException: Mapped Statements collection does not contain value
                            
                                Java - Initialize a HashMap of HashMaps
                            
                                How to submit a topology in storm production cluster using IDE
                            
                                How to add a radio button group in a core java program such that only one radio button is selected at one time?
                            
                                How to convert MultipartFile into byte stream
                            
                                compress pdf with large images via java
                            
                                Set Query Parameters on a Jersey Test Call

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With