Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does the Collections.shuffle() algorithm work better than my implementation [duplicate]

Tags:

java

shuffle

Collections.shuffle() goes through each index of a Collection backwards and then swaps it with a random index including or before it. I was wondering why, so I tried doing the same thing but swapping with any random index in the Collection.

Here is the shuffling part of the Collections.shuffle() code:

for (int i=size; i>1; i--)
    swap(arr, i-1, rnd.nextInt(i));

And here is my algorithm:

Random r = new Random();
for (int i = 0; i < a.size(); i++) {
    int index = r.nextInt(a.size());
    int temp = a.get(i);
    a.set(i, a.get(index));
    a.set(index, temp);
}

I found that Collections.shuffle() was much more evenly distributed than my code when I ran both on the same ArrayList one million times. Also, when running my code on:

[0, 1, 2, 3, 4]

it seems that the following permutations consistently occur most frequently:

[1, 0, 3, 4, 2]
[1, 2, 3, 4, 0]
[1, 2, 0, 4, 3]
[0, 2, 3, 4, 1]
[1, 2, 3, 0, 4]

Could someone please explain why?

like image 933
czhao Avatar asked Apr 06 '15 14:04

czhao


People also ask

What does collections shuffle do Java?

Java Collections shuffle(list) Method. The shuffle(list) method is used to work by randomly reorders the specified list elements using a default randomness.

How do you optimally shuffle an array?

A simple solution is to create an auxiliary array temp[] which is initially a copy of arr[]. Randomly select an element from temp[], copy the randomly selected element to arr[0], and remove the selected element from temp[]. Repeat the same process n times and keep copying elements to arr[1], arr[2], … .

What is a fair shuffling algorithm?

In a random shuffle, you want to take the elements of a list and reorder them randomly. In a “fair” random shuffle, all possible permutations must be equally likely. It is surprisingly hard to come up with a fair algorithm.

How do you shuffle elements in an ArrayList?

In order to shuffle elements of ArrayList with Java Collections, we use the Collections. shuffle() method. The java. util.


2 Answers

Collections.Shuffle() does a Fisher-Yates shuffle. It's a more evenly distributed form of shuffling, and does not reshuffle what might have previously been shuffled already, as opposed to your algorithm.

What your algorithm does (also the known as the naive implementation) is that it will randomly pick any array index and shuffle it over, meaning there's a high chance of it picking the same index that has previously been shuffled already.

The Fisher-Yates shuffle (also known as the Donald Knuth Shuffle) is an unbiased algorithm that shuffles items in the array in an equally likely probability. It avoids the chance if it 'moving' the same objects twice.

Here's a good explanation of the Fisher Yates Shuffle by our own Jeff Atwood at Coding Horror, he discusses the naive implementation of the random shuffle versus the Fisher Yates shuffle.

See also this SO question about Java's implementation. It mentions what you asked. You can also look at the source code if you'd like, as mentioned there. I found it by Googling Collections.shuffle() in the top 5 links.

To further discuss this, it's always a good idea to use the Fisher-Yates shuffle compared to the naive implementations, especially in implementations that require a higher level of randomness (such as shuffle poker cards) to avoid introducing odds and unfair play. It wouldn't be a good thing if games of chance where implemented based on our naive implementation, as the bias leads to what you've observed, where the same permutation seems to show up more often than the others.

Lastly, as user @jmruc mentioned, here's a very nice tutorial on visualizing algorithms, it contains the Fisher-Yates shuffle, as well as other algorithms, all beautifully presented. Might help you wrap your head around the concepts if you're more of a visualizer: Visualizing Algorithms by Mike Bostock

like image 125
matrixanomaly Avatar answered Sep 27 '22 18:09

matrixanomaly


This is another explanation of Fisher-Yates.

Consider the following method:

  1. There are two lists, A and B. Initially, all the elements are on list A so list B is empty.
  2. At each step:

    Pick with uniform probability from the elements currently on list A.

    Permute list A to make the chosen element the last element.

    Remove the last element from list A and append it to list B.

  3. When list A is empty, return list B.

I find this algorithm easy to understand and visualize.

The probability of a given item being chosen on the first step is 1/n. The probability of a given item being chosen on the second step is its probability of not being chosen on the first step, (n-1)/n, times its probability of being chosen on the second step given it is still on list A, 1/(n-1). That product is 1/n.

Similarly, it has probability ((n-1)/n)*((n-2)/(n-1)) = (n-2)/n of still being on list A after two items have been moved, and therefore a 1/n probability of being the third item chosen.

In general, the probability of still being on list A after k items have been chosen is ((n-1)/n)*((n-2)/(n-1))*...*((n-k)/(n-k+1)) = (n-k)/n. The probability of being chosen on the next step, given the item is still on list A, is 1/(n-k), so the unconditional probability being chosen on the step when list A has (n-k) items is ((n-k)/n)*(1/(n-k)) = 1/n.

Fisher-Yates is just this algorithm with the two lists, whose total length is always n, concatenated in a single array. At each step, it selects an element from list A with uniform probability, permutes list A to put that element adjacent to list B, and then moves the boundary so that it changes from being a list A element to being the most recently added element of list B.

like image 38
Patricia Shanahan Avatar answered Sep 27 '22 18:09

Patricia Shanahan