Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does ordering with Linq-to-Objects compare items to themselves?

Consider the following simple code with LINQ OrderBy and ThenBy:

static void Main()
{
  var arr1 = new[] { "Alpha", "Bravo", "Charlie", };

  var coStr = Comparer<string>.Create((x, y) =>
  {
    Console.WriteLine($"Strings: {x} versus {y}");
    return string.CompareOrdinal(x, y);
  });

  arr1.OrderBy(x => x, coStr).ToList();

  Console.WriteLine("--");

  var arr2 = new[]
  {
    new { P = "Alpha", Q = 7, },
    new { P = "Bravo", Q = 9, },
    new { P = "Charlie", Q = 13, },
  };

  var coInt = Comparer<int>.Create((x, y) =>
  {
    Console.WriteLine($"Ints: {x} versus {y}");
    return x.CompareTo(y);
  });

  arr2.OrderBy(x => x.P, coStr).ThenBy(x => x.Q, coInt).ToList();
}

This simply uses some comparers that write out to the console what they compare.

On my hardware and version of the Framework (.NET 4.6.2), this is the output:

Strings: Bravo versus Alpha
Strings: Bravo versus Bravo
Strings: Bravo versus Charlie
Strings: Bravo versus Bravo
--
Strings: Bravo versus Alpha
Strings: Bravo versus Bravo
Ints: 9 versus 9
Strings: Bravo versus Charlie
Strings: Bravo versus Bravo
Ints: 9 versus 9

My question is: Why would they compare an item from the query to itself?

In the first case, before the -- separator, they do four comparisons. Two of them compare an entry to itself ("Strings: Bravo versus Bravo"). Why?

In the second case, there should not ever be a need for resorting to comparing the Q properties (integers); for there are no duplicates (wrt. ordinal comparison) in the P values, so no tie-breaking from ThenBy should be needed ever. Still we see "Ints: 9 versus 9" twice. Why use the ThenBy comparer with identical arguments?

Note: Any comparer has to return 0 upon comparing something to itself. So unless the algorithm just wants to check if we implemented a comparer correctly (which it will never be able to do fully anyway), what is going on?

Be aware: There are no duplicates in the elements yielded by the queries in my examples.

I saw the same issue with another example with more entries yielded from the query. Above I just give a small example. This happens with an even number of elements yielded, as well.

like image 499
Jeppe Stig Nielsen Avatar asked Aug 29 '17 07:08

Jeppe Stig Nielsen


People also ask

What is difference between OrderBy and ThenBy in LINQ?

Generally, ThenBy method is used with the OrderBy method. The OrderBy() Method, first sort the elements of the sequence or collection in ascending order after that ThenBy() method is used to again sort the result of OrderBy() method in ascending order.

How does a LINQ work?

LINQ is a data querying API that provides querying capabilities to . NET languages with a syntax similar to a SQL. LINQ queries use C# collections to return data. LINQ in C# is used to work with data access from sources such as objects, data sets, SQL Server, and XML.

What is OrderBy in LINQ?

Sorts the elements of a sequence in ascending order according to a key. OrderBy<TSource,TKey>(IEnumerable<TSource>, Func<TSource,TKey>, IComparer<TKey>) Sorts the elements of a sequence in ascending order by using a specified comparer.

Is LINQ OrderBy ascending?

In LINQ, the OrderBy operator is used to sort the list/ collection values in ascending order. In LINQ, if we use order by the operator by default, it will sort the list of values in ascending order. We don't need to add any ascending condition in the query statement.


1 Answers

In the reference source of the QuickSort method used by OrderBy you can see these two lines:

while (i < map.Length && CompareKeys(x, map[i]) > 0) i++;
while (j >= 0 && CompareKeys(x, map[j]) < 0) j--;

These while loops run until they find an element that is no longer "greater" (resp. "less") than the one x points to. So they will break when the identical element is compared.

I can't prove it mathematical, but I guess to avoid comparing identical elements would make the algorithm more complicated and introduce overhead that would impact performance more than this single comparison.
(Note that your comparer should be implemented clever enough to quickly return 0 for identical elements)

like image 188
René Vogt Avatar answered Oct 07 '22 16:10

René Vogt