Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is "further filtering" for iterators?

I've always preferred to use IEnumerable instead of List, for obvious reasons, where applicable. In the current project, I've bumped into IList and after I've g'ed it, the Internet told me that there's no significant difference between them except for a single property - the support for further filtering.

Since I wasn't certain what that meant in regard to iterators in C#, I g'ed that too. Any possible relevant answer drowns in the gazillions of hits on "supports further filtering" telling me that IEnumerable does it, while IList doesn't.

So I'm asking twosome question here.

  1. What does support for further filtering mean?
  2. How can I google such terms (in a more general sense)?

As it's a general observation based on many posts, I can't list them all here. One example might be this particular link.

like image 967
Konrad Viltersten Avatar asked Aug 14 '14 07:08

Konrad Viltersten


1 Answers

There is no such thing as "further filtering".

Filtering collections is usually done using the IEnumerable.Where extension method, which is defined for the IEnumerable interface. And since IList inherits from IEnumerable, you can call Where on both interfaces (calling Where on a IList actually calls the IEnumerable.Where extension method). So, in both cases, the same base method is called, and the type of the resulting value will be an IEnumerable (not an IList when applied to a list). This might be the source of confusion ("you cannot filter the IList further since you don't have it anymore?"), but nothing stops you from filtering the resulting IEnumerable<T> again, or even writing your own extension method which would create a new List on each call.

The post linked to in the question is of low quality and shouldn't be trusted.

For detailed explanation, see below.

You can filter elements from both interfaces pretty much the same, although you will generally use IEnumerable extension methods (i.e. LINQ) in both cases, since IList inherits from IEnumerable. And you can chain as many Where statements as you like in both cases:

// `items` is an `IEnumerable<T>`, so we can call the `Where` extension method.
// Each call creates a new instance, and keeps the previous one unmodified.
IEnumerable<T> items = GetEnumerableItems();
var filteredItems = items
    .Where(i => i.Name == "Jane")      // returns a new IEnumerable<T>
    .Where(i => i.Gender == "Female")  // returns a new IEnumerable<T>
    .Where(i => i.Age == 30)           // returns a new IEnumerable<T>

// `list` is an `IList<T>`, which also inherits from `IEnumerable<T>`.
// Calling `Where` on a list will also not modify the original list.
IList<T> list = GetEnumerableItems();
var filteredList = list
    .Where(i => i.Name == "John")      // returns a new IEnumerable<T>
    .Where(i => i.Gender == "Male")    // returns a new IEnumerable<T>
    .Where(i => i.Age == 30)           // returns a new IEnumerable<T>
    .ToList();                         // returns a new List<T> (optional)

Googling for the term returns several articles mentioning it (like this, or this), they all seem to copy the same source, seems like plagiarism without actual reasoning behind it. The only thing that can come to my mind is that applying Where to an IEnumerable<T> returns a new (filtered) IEnumerable<T>, to which you can again apply Where (filter it "further"). But that is really vague, since applying Where to an IList<T> will not prevent you from filtering it, even though the resulting interface is an IEnumerable<T>. As mentioned in comments, it might be worth mentioning that the List<T> class, as a concrete implementation of IList<T>, exposes a FindAll method which returns a new filtered concrete List<T> (and can be "further filtered"), but that's not a part of IList<T>.

The main difference between repeatedly filtering an IEnumerable<T> and filtering a list into a new list (e.g. using FindAll), is that the latter needs to create a new List<T> instance in each step, while IEnumerable<T> uses deferred execution and doesn't take extra memory apart from storing some tiny state information for each Where call. And again, just to avoid confusion, if you call Where on a List<T>, you still get the benefits of IEnumerable<T> laziness.

Actual differences:

IList (or actually IList<T>, which I am presuming you're referring to) represents a collection of objects that can be individually accessed by index. This means that you can efficiently (in O(1) time) get the value of an object at a certain location, as well as list's length. The "bad thing" is (presuming that it's implemented as a List<T> under the hood), that this means that you need to keep the entire collection in memory.

The "only thing" IEnumerable (i.e. its generic counterpart IEnumerable<T>) can do is to iterate over (zero or more) items. It doesn't have a notion of an index (you cannot "jump" to an index, without actually iterating, or skipping, all items before that item). And you also cannot get the length efficiently in the general case, without actually counting items every time. On the other hand, an IEnumerable is lazy-evaluated, meaning that its elements don't have to exist in memory until they are about to be evaluated. It can wrap a database table underneath, with billions of rows, fetched from the disk as you iterate it. It can even be an infinite collection.

like image 77
Groo Avatar answered Oct 08 '22 10:10

Groo