Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shuffle using IComparer

First of all, I do know about the Fisher-Yates shuffle. But lets say for arguments sake that I want to allow the user to pick a sort option from a Dropdown list. This list would include a "Random" option. Based on the result of their selection I just want to substitute in an IComparer instance for my sort. What would the IComparer look like?

Google brings up a plethora of flawed results that all take this form:

public class NaiveRandomizer<T> : IComparer<T>
{
    private static Random rand = new Random();

    public int Compare(T x, T y)
    {
        return (x.Equals(y))?0:rand.Next(-1, 2);
    }
}

However, that implementation is biased and will even throw an exception in some circumstances. The bias can be demonstrated with the following code:

void Test()
{
    Console.WriteLine("NaiveRandomizer Test:");
    var data = new List<int>() {1,2,3};
    var sortCounts = new Dictionary<string, int>(6);
    var randomly = new NaiveRandomizer<int>();

    for (int i=0;i<10000;i++)
    {   //always start with same list, in _the same order_.
        var dataCopy = new List<int>(data); 
        dataCopy.Sort(randomly);

        var key = WriteList(dataCopy);
        if (sortCounts.ContainsKey(key))
            sortCounts[key]++;
        else
            sortCounts.Add(key, 1);
    }

    foreach (KeyValuePair<string, int> item in sortCounts)
        Console.WriteLine(item.Key + "\t" + item.Value);
}

string WriteList<T>(List<T> list)
{
   string delim = "";
   string result = "";
   foreach(T item in list)
   {
       result += delim + item.ToString();
       delim = ", ";
   }
   return result;
}

So how could you implement a random IComparer<T> that solved those issues? It is allowed to require each call to .Sort() to use a separate IComparer instance, as I don't see any other way to do this: items must be compared using some other, truly random value, but that value must also be consistent for an item within a given sort operation.

I have a start here, but it was posted in haste, is extremely slow, and doesn't even return all possible sorts (testing shows that it does at least eliminate bias, if you don't count the missing options). I don't expect O(n) performance like Fisher-Yates, but I do want something reasonable (n log n for a small-ish n), and I do expect it to show all possible sorts. Unfortunately, that link is the current accepted answer for it's question and so I'm hoping to be able to replace it with something a little better.

If nothing else, I want this to be a magnet for all those google queries looking for an IComparable solution- that they'll end up here instead of somewhere else telling them to use the incorrect version.

like image 360
Joel Coehoorn Avatar asked Feb 17 '09 17:02

Joel Coehoorn


2 Answers

I was somewhat surprised in this thread how many wrong answers were posted. Just for the sake of others who come up with a solution similar to the one posted by the OP, the following code looks correct:

int[] nums = new int[1000];
for (int i = 0; i < nums.Length; i++)
{
    nums[i] = i;
}

Random r = new Random();
Array.Sort<int>(nums, (x, y) => r.Next(-1, 2));

foreach(var num in nums)
{
    Console.Write("{0} ", num);
}

However, the code will throw an exception occasionally, but not always. That's what makes it fun to debug :) If you run it enough times, or execute the sort procedure in a loop 50 or so times, you'll get an error stating:

IComparer (or the IComparable methods it relies upon) did not return zero when Array.Sort called x. CompareTo(x). x: '0' x's type: 'Int32' The IComparer: ''.

In other words, the quick sort compared some number x to itself and got a non-zero result. The obvious solution to the code would be write:

Array.Sort<int>(nums, (x, y) =>
    {
        if (x == y) return 0;
        else return r.NextDouble() < 0.5 ? 1 : -1;
    });

But even this doesn't work, because there are occasions where .NET compares 3 numbers against one another which return inconsistent results, such as A > B, B > C, and C > A (oops!). No matter if you use a Guid, GetHashCode, or any other randomly generated input, a solution like the one shown above is still wrong.


With that being said, Fisher-Yates is the standard way of shuffling arrays, so there's no real reason to use IComparer in the first place. Fisher-Yates is O(n) whereas any implementation using IComparer uses a quicksort behinds the scenes which has a time-complexity of O(n log n). There's just no good reason not to use the well-known, efficient, standard algorithm to solve this kind of problem.

However, if you really insist on using an IComparer and a rand, then apply your random data before you sort. This requires a projection of the data onto another object so you don't lose your random data:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication1
{
    class Pair<T, U>
    {
        public T Item1 { get; private set; }
        public U Item2 { get; private set; }
        public Pair(T item1, U item2)
        {
            this.Item1 = item1;
            this.Item2 = item2;
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            Pair<int, double>[] nums = new Pair<int, double>[1000];
            Random r = new Random();
            for (int i = 0; i < nums.Length; i++)
            {
                nums[i] = new Pair<int, double>(i, r.NextDouble());
            }

            Array.Sort<Pair<int, double>>(nums, (x, y) => x.Item2.CompareTo(y.Item2));

            foreach (var item in nums)
            {
                Console.Write("{0} ", item.Item1);
            }

            Console.ReadKey(true);
        }
    }
}

Or get LINQy with your bad self:

Random r = new Random();
var nums = from x in Enumerable.Range(0, 1000)
           orderby r.NextDouble()
           select x;
like image 152
Juliet Avatar answered Oct 19 '22 23:10

Juliet


One suggestion I got elsewhere was to create a separate IArranger interface that describes a single operation to Arrange a collection. This can work where IComparer/IComparable cannot because it operates on an entire collection, instead of individual items. It might look something like this:

public interface IArranger<T>
{
    IEnumerable<T> Arrange(IEnumerable<T> items);
}

Then I could implement a Shuffle from the IArranger interface using a proper Fisher-Yates algorithm, and also have implementations that wrap each additional IEnumerable.Sort()/IComparable/IComparer varieties that I care about. That might look something like this:

public class ComparerArranger<T> : IArranger<T>
{
    private IComparer<T> comparer;

    public ComparableArranger(IComparer<T> comparer)
    {
        this.comparer = comparer;
    }

    public IEnumerable<T> Arrange(IEnumerable<T> items)
    {
       return items.OrderBy(i => i, comparer);
    }
}

or

//uses the default Comparer for the type (Comparer<T>.Default)
public class TypeArranger<T> : IArranger<T> 
{
    public IEnumerable<T> Arrange(IEnumerable<T> items)
    {
       return items.OrderBy(i => i);
    }
}

or

public class ShuffleArranger<T> : IArranger<T>
{
    //naive implementation for demonstration
    // if I ever develop this more completely I would try to
    // avoid needing to call .ToArray() in here
    // and use a better prng
    private Random r = new Random();

    public IEnumerable<T> Arrange(IEnumerable<T> items)
    {
        var values = items.ToArray();

        //valid Fisher-Yates shuffle on the values array
        for (int i = values.Length; i > 1; i--)
        {
            int j = r.Next(i);
            T tmp = values[j];
            values[j] = values[i - 1];
            values[i - 1] = tmp;
        }
        foreach (var item in values) yield return item;
    }
}

For a final step, I add support for this to any IEnumerable via an extension method. Then you still get the simple run-time algorithm swapping, you have a better implementation of the shuffle algorithm, and the code to use it feels natural:

public static IEnumerable<T> Arrange(this IEnumerable<T> items, IArranger<T> arranger)
{
    return arranger.Arrange(items);
}
like image 27
Joel Coehoorn Avatar answered Oct 19 '22 23:10

Joel Coehoorn