Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does failing to recognise equality mess up C# List<T> sort?

Tags:

c#

list

This is a somewhat obscure question, but after wasting an hour tracking down the bug, I though it worth asking...

I wrote a custom ordering for a struct, and made one mistake:

  • My struct has a special state, let us call this "min".
  • If the struct is in the min state, then it's smaller than any other struct.
  • My CompareTo method made one mistake: a.CompareTo(b) would return -1 whenever a was "min", but of course if b is also "min" it should return 0.

Now, this mistake completely messed up a List<MyStruct> Sort() method: the whole list would (sometimes) come out in a random order.

  • My list contained exactly one object in "min" state.
  • It seems my mistake could only affect things if the one "min" object was compared to itself.
  • Why would this even happen when sorting?
  • And even if it did, how can it cause the relative order of two "non-min" objects to be wrong?

Using the LINQ OrderBy method can cause an infinite loop...

Small, complete, test example:

struct MyStruct : IComparable<MyStruct>
{
    public int State;
    public MyStruct(int s) { State = s; }
    public int CompareTo(MyStruct rhs)
    {
        // 10 is the "min" state.  Otherwise order as usual
        if (State == 10) { return -1; } // Incorrect
        /*if (State == 10) // Correct version
        {
            if (rhs.State == 10) { return 0; }
            return -1;
        }*/
        if (rhs.State == 10) { return 1; }
        return this.State - rhs.State;
    }
    public override string ToString()
    {
        return String.Format("MyStruct({0})", State);
    }
}

class Program
{
    static int Main()
    {
        var list = new List<MyStruct>();
        var rnd = new Random();
        for (int i = 0; i < 20; ++i)
        {
            int x = rnd.Next(15);
            if (x >= 10) { ++x;  }
            list.Add(new MyStruct(x));
        }
        list.Add(new MyStruct(10));
        list.Sort();
        // Never returns...
        //list = list.OrderBy(item => item).ToList();

        Console.WriteLine("list:");
        foreach (var x in list) { Console.WriteLine(x); }

        for (int i = 1; i < list.Count(); ++i)
        {
            Console.Write("{0} ", list[i].CompareTo(list[i - 1]));
        }

        return 0;
    }
}
like image 590
Matthew Daws Avatar asked Jun 16 '15 10:06

Matthew Daws


People also ask

Why is equality so important?

Equality is about ensuring that every individual has an equal opportunity to make the most of their lives and talents. It is also the belief that no one should have poorer life chances because of the way they were born, where they come from, what they believe, or whether they have a disability.

What are the problems of gender equality?

Far too many girls, especially those from the poorest families, still face gender discrimination in education, child marriage and pregnancy, sexual violence and unrecognized domestic work. These are some types of gender inequality.

How does gender equality affect society?

Gender equality makes our communities safer and healthier Unequal societies are less cohesive. They have higher rates of anti-social behaviour and violence. Countries with greater gender equality are more connected. Their people are healthier and have better wellbeing.

Why does diversity and inclusion fail?

A D&I program cannot succeed in the absence of adequate representation of diverse groups of people. One key hindrance is the perception of diversity as anything other than meaning the inclusion of people of distinct ages, genders, ethnicities, social classes, income levels, and different cultural backgrounds.


1 Answers

It seems my mistake could only affect things if the one "min" object was compared to itself.

Not quite. It could also be caused if there were two different "min" objects. In the case of the list sorted this particular time, it can only happen if the item is compared to itself. But the other case is worth considering generally in terms of why supplying a non-transitive comparer to a method that expects a transitive comparer is a very bad thing.

Why would this even happen when sorting?

Why not?

List<T>.Sort() works by using the Array.Sort<T> on its items. Array.Sort<T> in turn uses a mixture of Insertion Sort, Heapsort and Quicksort, but to simplify let's consider a general quicksort. For simplicity we'll use IComparable<T> directly, rather than via System.Collections.Generic.Comparer<T>.Default:

public static void Quicksort<T>(IList<T> list) where T : IComparable<T>
{
  Quicksort<T>(list, 0, list.Count - 1);
}
public static void Quicksort<T>(IList<T> list, int left, int right) where T : IComparable<T>
{
  int i = left;
  int j = right;
  T pivot = list[(left + right) / 2];

  while(i <= j)
  {
    while(list[i].CompareTo(pivot) < 0)
      i++;

    while(list[j].CompareTo(pivot) > 0)
      j--;

    if(i <= j)
    {
      T tmp = list[i];
      list[i] = list[j];
      list[j] = tmp;
      i++;
      j--;
    }
  }

  if(left < j)
    Quicksort(list, left, j);

  if(i < right)
    Quicksort(list, i, right);
}

This works as follows:

  1. Pick an element, called a pivot, from the list(we use the middle).
  2. Reorder the list so that all elements with values less than the pivot come before the pivot, while all elements with values greater than the pivot come after it.
  3. The pivot is now in its final position, with an unsorted sub-list before and after it. Recursively apply the same steps to these two sub-lists.

Now, there are two things to note about the example code above.

The first is that we do not prevent pivot being compared with itself. We could do this, but why would we? For one thing, we need some sort of comparison code to do this, which is precisely what you've already provided in your CompareTo() method. In order to avoid the wasted CompareTo we'd have to either call CompareTo()* an extra time for each comparison (!) or else track the position of pivot which would add more waste than it removed.

And even if it did, how can it cause the relative order of two "non-min" objects to be wrong?

Because quicksort partitions, it doesn't do one massive sort, but a series of mini-sorts. Therefore an incorrect comparison gets a series of opportunities to mess up parts of those sorts, each time leading to a sub-list of incorrectly sorted values that the algorithm considers "dealt with". So in those cases where the bug in the comparer hits, its damage can be spread throughout much of the list. Just as it does its sort by a series of mini-sorts, so it will do a buggy sort by a series of buggy mini-sorts.

Using the LINQ OrderBy method can cause an infinite loop

It uses a variant of Quicksort that guarantees stability; two equivalent item will still have the same relative order after the search as before. The extra complexity is presumably leading to it not only comparing the item to itself, but then continuing to do so forever, as it tries to make sure that it is both in front of itself, but also in the same order to itself as it was before. (Yes, that last sentence makes no sense, and that's exactly why it never returns).

*If this was a reference rather than value type then we could do ReferenceEquals quickly, but aside from the fact that this won't be any good with structs, and the fact that if that really was a time-saver for the type in question it should have if(ReferenceEquals(this, other)) return 0; in the CompareTo anyway, it still wouldn't fix the bug once there was more than one "min" items in the list.

like image 61
Jon Hanna Avatar answered Nov 08 '22 20:11

Jon Hanna