Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does LINQ .distinct method sort?

Let's say I'm using the LINQ array .Distinct() method. The result is unordered.

Well, everything is "ordered" if you know the logic used to produce the result.

My question is about the result set. Will the resulting array be in the "first distinct" order or perhaps the "last distinct" order?

Can I never count on any order?

This is the old "remove duplicate strings" problem but I'm looking into the LINQ solution.

like image 294
Matthew Avatar asked Nov 05 '10 20:11

Matthew


2 Answers

Assuming you mean LINQ to Objects, it basically keeps a set of all the results it's returned so far, and only yields the "current" item if it hasn't been yielded before. So the results are in the original order, with duplicates removed. Something like this (except with error checking etc):

public static IEnumerable<T> Distinct<T>(this IEnumerable<T> source)
{
    HashSet<T> set = new HashSet<T>();

    foreach (T item in source)
    {
        if (set.Add(item))
        {
            // New item, so yield it
            yield return item;
        }
    }
}

This isn't guaranteed - but I can't imagine any more sensible implementation. This allows Distinct() to be as lazy as it can be - data is returned as soon as it can be, and only the minimum amount of data is buffered.

Relying on this would be a bad idea, but it can be instructive to know how the current implementation (apparently) works. In particular, you can easily observe that it starts returning data before exhausting the original sequence, simply by creating a source which logs when it produces data to be consumed by Distinct, and also logging when you receive data from Distinct.

like image 143
Jon Skeet Avatar answered Sep 18 '22 06:09

Jon Skeet


The docs say:

"The result sequence is unordered."

like image 35
Gabriel Magana Avatar answered Sep 18 '22 06:09

Gabriel Magana