Let's say I'm using the LINQ array .Distinct()
method.
The result is unordered.
Well, everything is "ordered" if you know the logic used to produce the result.
My question is about the result set. Will the resulting array be in the "first distinct" order or perhaps the "last distinct" order?
Can I never count on any order?
This is the old "remove duplicate strings" problem but I'm looking into the LINQ solution.
Assuming you mean LINQ to Objects, it basically keeps a set of all the results it's returned so far, and only yields the "current" item if it hasn't been yielded before. So the results are in the original order, with duplicates removed. Something like this (except with error checking etc):
public static IEnumerable<T> Distinct<T>(this IEnumerable<T> source)
{
HashSet<T> set = new HashSet<T>();
foreach (T item in source)
{
if (set.Add(item))
{
// New item, so yield it
yield return item;
}
}
}
This isn't guaranteed - but I can't imagine any more sensible implementation. This allows Distinct()
to be as lazy as it can be - data is returned as soon as it can be, and only the minimum amount of data is buffered.
Relying on this would be a bad idea, but it can be instructive to know how the current implementation (apparently) works. In particular, you can easily observe that it starts returning data before exhausting the original sequence, simply by creating a source which logs when it produces data to be consumed by Distinct
, and also logging when you receive data from Distinct
.
The docs say:
"The result sequence is unordered."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With