I came across this implementation in Enumerable.cs by reflector.
public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) { //check parameters TSource local = default(TSource); long num = 0L; foreach (TSource local2 in source) { if (predicate(local2)) { local = local2; num += 1L; //I think they should do something here like: //if (num >= 2L) throw Error.MoreThanOneMatch(); //no necessary to continue } } //return different results by num's value }
I think they should break the loop if there are more than 2 items meets the condition, why they always loop through the whole collection? In case of that reflector disassembles the dll incorrectly, I write a simple test:
class DataItem { private int _num; public DataItem(int num) { _num = num; } public int Num { get{ Console.WriteLine("getting "+_num); return _num;} } } var source = Enumerable.Range(1,10).Select( x => new DataItem(x)); var result = source.Single(x => x.Num < 5);
For this test case, I think it will print "getting 0, getting 1" and then throw an exception. But the truth is, it keeps "getting 0... getting 10" and throws an exception. Is there any algorithmic reason they implement this method like this?
EDIT Some of you thought it's because of side effects of the predicate expression, after a deep thought and some test cases, I have a conclusion that side effects doesn't matter in this case. Please provide an example if you disagree with this conclusion.
Yes, I do find it slightly strange especially because the overload that doesn't take a predicate (i.e. works on just the sequence) does seem to have the quick-throw 'optimization'.
In the BCL's defence however, I would say that the InvalidOperation exception that Single throws is a boneheaded exception that shouldn't normally be used for control-flow. It's not necessary for such cases to be optimized by the library.
Code that uses Single
where zero or multiple matches is a perfectly valid possibility, such as:
try { var item = source.Single(predicate); DoSomething(item); } catch(InvalidOperationException) { DoSomethingElseUnexceptional(); }
should be refactored to code that doesn't use the exception for control-flow, such as (only a sample; this can be implemented more efficiently):
var firstTwo = source.Where(predicate).Take(2).ToArray(); if(firstTwo.Length == 1) { // Note that this won't fail. If it does, this code has a bug. DoSomething(firstTwo.Single()); } else { DoSomethingElseUnexceptional(); }
In other words, we should leave the use of Single
to cases when we expect the sequence to contain only one match. It should behave identically to First
but with the additional run-time assertion that the sequence doesn't contain multiple matches. Like any other assertion, failure, i.e. cases when Single
throws, should be used to represent bugs in the program (either in the method running the query or in the arguments passed to it by the caller).
This leaves us with two cases:
Single
to consume the entire sequence anyway to assert our claim. There's no benefit to the 'optimization'. In fact, one could argue that the sample implementation of the 'optimization' provided by the OP will actually be slower because of the check on every iteration of the loop.To sum up, if the 'poor implementation' is biting you performance-wise in production, either:
Single
incorrectly.EDIT: Clarified my point.
EDIT: Here's a valid use of Single, where failure indicates bugs in the calling code (bad argument):
public static User GetUserById(this IEnumerable<User> users, string id) { if(users == null) throw new ArgumentNullException("users"); // Perfectly fine if documented that a failure in the query // is treated as an exceptional circumstance. Caller's job // to guarantee pre-condition. return users.Single(user => user.Id == id); }
Update:
I got some very good feedback to my answer, which has made me re-think. Thus I will first provide the answer that states my "new" point of view; you can still find my original answer just below. Make sure to read the comments in-between to understand where my first answer misses the point.
Let's assume that Single
should throw an exception when it's pre-condition is not met; that is, when Single
detects than either none, or more than one item in the collection matches the predicate.
Single
can only succeed without throwing an exception by going through the whole collection. It has to make sure that there is exactly one matching item, so it will have to check all items in the collection.
This means that throwing an exception early (as soon as it finds a second matching item) is essentially an optimization that you can only benefit from when Single
's pre-condition cannot be met and when it will throw an exception.
As user CodeInChaos says clearly in a comment below, the optimization wouldn't be wrong, but it is meaningless, because one usually introduces optimizations that will benefit correctly-working code, not optimizations that will benefit malfunctioning code.
Thus, it is actually correct that Single
could throw an exception early; but it doesn't have to, because there's practically no added benefit.
I cannot give a technical reason why that method is implemented the way it is, since I didn't implement it. But I can state my understanding of the Single
operator's purpose, and from there draw my personal conclusion that it is indeed badly implemented:
Single
:What is the purpose of Single
, and how is it different from e.g. First
or Last
?
Using the Single
operator basically expresses one's assumption that exactly one item must be returned from the collection:
If you don't specify a predicate, it should mean that the collection is expected to contain exactly one item.
If you do specify a predicate, it should mean that exactly one item in the collection is expected to satisfy that condition. (Using a predicate should have the same effect as items.Where(predicate).Single()
.)
This is what makes Single
different from other operators such as First
, Last
, or Take(1)
. None of those operators have the requirement that there should be exactly one (matching) item.
When should Single
throw an exception?
Basically, when it finds that your assumption was wrong; i.e. when the underlying collection does not yield exactly one (matching) item. That is, when there are zero or more than one items.
When should Single
be used?
The use of Single
is appropriate when your program's logic can guarantee that the collection will yield exactly one item, and one item only. If an exception gets thrown, that should mean that your program's logic contains a bug.
If you process "unreliable" collections, such as I/O input, you should first validate the input before you pass it to Single
. Single
, together with an exception catch
block, is not appropriate for making sure that the collection has only one matching item. By the time you invoke Single
, you should already have made sure that there'll be only one matching item.
The above states my understanding of the Single
LINQ operator. If you follow and agree with this understanding, you should come to the conclusion that Single
ought to throw an exception as early as possible. There is no reason to wait until the end of the (possibly very large) collection, because the pre-condition of Single
is violated as soon as it detects a second (matching) item in the collection.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With