I get when one would return an IEnumerable
from a method—when there's value in deferred execution. And returning a List
or IList
should pretty much be only when the result is going to be modified, otherwise I'd return an IReadOnlyCollection
, so the caller knows what he's getting isn't intended for modification (and this lets the method even reuse objects from other callers).
However, on the parameter input side, I'm a little less clear. I could take an IEnumerable
, but what if I need to enumerate more than once?
The saying "Be conservative in what you send, be liberal in what you accept" suggests taking an IEnumerable
is good, but I'm not really sure.
For example, if there are no elements in the following IEnumerable
parameter, a significant amount of work can be saved in this method by checking .Any()
first, which requires ToList()
before that to avoid enumerating twice.
public IEnumerable<Data> RemoveHandledForDate(IEnumerable<Data> data, DateTime dateTime) {
var dataList = data.ToList();
if (!dataList.Any()) {
return dataList;
}
var handledDataIds = new HashSet<int>(
GetHandledDataForDate(dateTime) // Expensive database operation
.Select(d => d.DataId)
);
return dataList.Where(d => !handledDataIds.Contains(d.DataId));
}
So I'm wondering what is the best signature, here? One possibility is IList<Data> data
, but accepting a list suggests that you plan to modify it, which is not correct—this method doesn't touch the original list, so IReadOnlyCollection<Data>
seems better.
But IReadOnlyCollection
forces callers to do ToList().AsReadOnly()
every time which gets a bit ugly, even with a custom extension method .AsReadOnlyCollection
. And that's not being liberal in what is accepted.
What is best practice in this situation?
This method is not returning an IReadOnlyCollection
because there may be value in the final Where
using deferred execution as the whole list is not required to be enumerated. However, the Select
is required to be enumerated because the cost of doing .Contains
would be horrible without the HashSet
.
I don't have a problem with calling ToList
, it just occurred to me that if I need a List
to avoid multiple enumeration, why do I not just ask for one in the parameter? So the question here is, if I don't want an IEnumerable
in my method, should I really accept one in order to be liberal (and ToList
it myself), or should I put the burden on the caller to ToList().AsReadOnly()
?
Further Information for those unfamiliar with IEnumerables
The real problem here is not the cost of Any()
vs. ToList()
. I understand that enumerating the entire list costs more than doing Any()
. However, assume the case that the caller will consume all items in the return IEnumerable
from the above method, and assume that the source IEnumerable<Data> data
parameter comes from the result of this method:
public IEnumerable<Data> GetVeryExpensiveDataForDate(DateTime dateTime) {
// This query is very expensive no matter how many rows are returned.
// It costs 5 seconds on each `.GetEnumerator` call to get 1 value or 1000
return MyDataProvider.Where(d => d.DataDate == dateTime);
}
Now if you do this:
var myData = GetVeryExpensiveDataForDate(todayDate);
var unhandledData = RemoveHandledForDate(myData, todayDate);
foreach (var data in unhandledData) {
messageBus.Dispatch(data); // fully enumerate
)
And if RemovedHandledForDate
does Any
and does Where
, you'll incur the 5 second cost twice, instead of once. This is why you should always take extreme pains to avoid enumerating an IEnumerable
more than once. Do not rely on your knowledge that in fact it's harmless, because some future hapless developer may call your method some day with a newly implemented IEnumerable
you never thought of, which has different characteristics.
The contract for an IEnumerable
says that you can enumerate it. It does NOT promise anything about the performance characteristics of doing so more than once.
In fact, some IEnumerables
are volatile and won't return any data upon a subsequent enumeration! Switching to one would be a totally breaking change if combined with multiple enumeration (and a very hard to diagnose one if the multiple enumeration was added later).
Don't do multiple enumeration of an IEnumerable.
If you accept an IEnumerable parameter, you are in effect promising to enumerate it exactly 0 or 1 times.
You use IEnumerable when you want to loop through the items in a collection. IList is when you want to add, remove, and access the list contents out of order.
IList doesn't support further filtering. IEnumerable exists in System. Collections Namespace. IEnumerable is a forward only collection, it can't move backward and between the items.
ICollection<T> is an interface that exposes collection semantics such as Add() , Remove() , and Count . Collection<T> is a concrete implementation of the ICollection<T> interface. IList<T> is essentially an ICollection<T> with random order-based access.
Read-only collections, dictionaries, and lists in . IReadOnlyCollection<Product> data = products; int numOfRecords = data. Count; The IReadOnlyDictionary interface provides a read-only view of a dictionary and represents a read-only collection of key/value pairs.
IReadOnlyCollection<T>
adds to IEnumerable<T>
a Count
property and the corresonding promise that there is no deferred execution. It would be the appropriate parameter to ask for, if the parameter is where you want to tackle this problem.
However, I suggest asking for IEnumerable<T>
, and calling ToList()
in the implementation itself instead.
Observation: Both approaches have the drawback that the multiple enumeration may at some point be refactored away, rendering the parameter change or ToList()
call redundant, which we may overlook. I do not think this can be avoided.
The case does speak for calling ToList()
in the method body: Since the multiple enumeration is an implementation detail, avoiding it should be an implementation detail as well. This way, we avoid affecting the API. We also avoid changing back the API if the multiple enumeration ever gets refactored away. We also avoid propagating the requirement through a chain of methods, that otherwise might all decide to ask for an IReadOnlyCollection<T>
, only because of our multiple enumeration.
If you are concerned with the overhead of creating extra lists (when the output already is a list or so), Resharper suggests the following approach:
param = param as IList<SomeType> ?? param.ToList();
Of course, we can do even better, because we only need to protect against deferred execution - no need for a full-blown IList<T>
:
param = param as IReadOnlyCollection<SomeType> ?? param.ToList();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With