Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bad implementation of Enumerable.Single?

I came across this implementation in Enumerable.cs by reflector.

public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) {     //check parameters     TSource local = default(TSource);     long num = 0L;     foreach (TSource local2 in source)     {         if (predicate(local2))         {             local = local2;             num += 1L;             //I think they should do something here like:             //if (num >= 2L) throw Error.MoreThanOneMatch();             //no necessary to continue         }     }     //return different results by num's value } 

I think they should break the loop if there are more than 2 items meets the condition, why they always loop through the whole collection? In case of that reflector disassembles the dll incorrectly, I write a simple test:

class DataItem {    private int _num;    public DataItem(int num)    {       _num = num;    }     public int Num    {       get{ Console.WriteLine("getting "+_num); return _num;}    } }  var source = Enumerable.Range(1,10).Select( x => new DataItem(x)); var result = source.Single(x => x.Num < 5); 

For this test case, I think it will print "getting 0, getting 1" and then throw an exception. But the truth is, it keeps "getting 0... getting 10" and throws an exception. Is there any algorithmic reason they implement this method like this?

EDIT Some of you thought it's because of side effects of the predicate expression, after a deep thought and some test cases, I have a conclusion that side effects doesn't matter in this case. Please provide an example if you disagree with this conclusion.

like image 719
Cheng Chen Avatar asked Jan 28 '11 06:01

Cheng Chen


2 Answers

Yes, I do find it slightly strange especially because the overload that doesn't take a predicate (i.e. works on just the sequence) does seem to have the quick-throw 'optimization'.


In the BCL's defence however, I would say that the InvalidOperation exception that Single throws is a boneheaded exception that shouldn't normally be used for control-flow. It's not necessary for such cases to be optimized by the library.

Code that uses Single where zero or multiple matches is a perfectly valid possibility, such as:

try {      var item = source.Single(predicate);      DoSomething(item); }  catch(InvalidOperationException) {      DoSomethingElseUnexceptional();     } 

should be refactored to code that doesn't use the exception for control-flow, such as (only a sample; this can be implemented more efficiently):

var firstTwo = source.Where(predicate).Take(2).ToArray();  if(firstTwo.Length == 1)  {     // Note that this won't fail. If it does, this code has a bug.     DoSomething(firstTwo.Single());  } else {     DoSomethingElseUnexceptional(); } 

In other words, we should leave the use of Single to cases when we expect the sequence to contain only one match. It should behave identically to Firstbut with the additional run-time assertion that the sequence doesn't contain multiple matches. Like any other assertion, failure, i.e. cases when Single throws, should be used to represent bugs in the program (either in the method running the query or in the arguments passed to it by the caller).

This leaves us with two cases:

  1. The assertion holds: There is a single match. In this case, we want Single to consume the entire sequence anyway to assert our claim. There's no benefit to the 'optimization'. In fact, one could argue that the sample implementation of the 'optimization' provided by the OP will actually be slower because of the check on every iteration of the loop.
  2. The assertion fails: There are zero or multiple matches. In this case, we do throw later than we could, but this isn't such a big deal since the exception is boneheaded: it is indicative of a bug that must be fixed.

To sum up, if the 'poor implementation' is biting you performance-wise in production, either:

  1. You are using Single incorrectly.
  2. You have a bug in your program. Once the bug is fixed, this particular performance problem will go away.

EDIT: Clarified my point.

EDIT: Here's a valid use of Single, where failure indicates bugs in the calling code (bad argument):

public static User GetUserById(this IEnumerable<User> users, string id) {      if(users == null)         throw new ArgumentNullException("users");       // Perfectly fine if documented that a failure in the query      // is treated as an exceptional circumstance. Caller's job       // to guarantee pre-condition.              return users.Single(user => user.Id == id);     } 
like image 128
Ani Avatar answered Oct 19 '22 02:10

Ani


Update:
I got some very good feedback to my answer, which has made me re-think. Thus I will first provide the answer that states my "new" point of view; you can still find my original answer just below. Make sure to read the comments in-between to understand where my first answer misses the point.

New answer:

Let's assume that Single should throw an exception when it's pre-condition is not met; that is, when Single detects than either none, or more than one item in the collection matches the predicate.

Single can only succeed without throwing an exception by going through the whole collection. It has to make sure that there is exactly one matching item, so it will have to check all items in the collection.

This means that throwing an exception early (as soon as it finds a second matching item) is essentially an optimization that you can only benefit from when Single's pre-condition cannot be met and when it will throw an exception.

As user CodeInChaos says clearly in a comment below, the optimization wouldn't be wrong, but it is meaningless, because one usually introduces optimizations that will benefit correctly-working code, not optimizations that will benefit malfunctioning code.

Thus, it is actually correct that Single could throw an exception early; but it doesn't have to, because there's practically no added benefit.


Old answer:

I cannot give a technical reason why that method is implemented the way it is, since I didn't implement it. But I can state my understanding of the Single operator's purpose, and from there draw my personal conclusion that it is indeed badly implemented:

My understanding of Single:

What is the purpose of Single, and how is it different from e.g. First or Last?

Using the Single operator basically expresses one's assumption that exactly one item must be returned from the collection:

  • If you don't specify a predicate, it should mean that the collection is expected to contain exactly one item.

  • If you do specify a predicate, it should mean that exactly one item in the collection is expected to satisfy that condition. (Using a predicate should have the same effect as items.Where(predicate).Single().)

This is what makes Single different from other operators such as First, Last, or Take(1). None of those operators have the requirement that there should be exactly one (matching) item.

When should Single throw an exception?

Basically, when it finds that your assumption was wrong; i.e. when the underlying collection does not yield exactly one (matching) item. That is, when there are zero or more than one items.

When should Single be used?

The use of Single is appropriate when your program's logic can guarantee that the collection will yield exactly one item, and one item only. If an exception gets thrown, that should mean that your program's logic contains a bug.

If you process "unreliable" collections, such as I/O input, you should first validate the input before you pass it to Single. Single, together with an exception catch block, is not appropriate for making sure that the collection has only one matching item. By the time you invoke Single, you should already have made sure that there'll be only one matching item.

Conclusion:

The above states my understanding of the Single LINQ operator. If you follow and agree with this understanding, you should come to the conclusion that Single ought to throw an exception as early as possible. There is no reason to wait until the end of the (possibly very large) collection, because the pre-condition of Single is violated as soon as it detects a second (matching) item in the collection.

like image 37
stakx - no longer contributing Avatar answered Oct 19 '22 00:10

stakx - no longer contributing