Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

IEnumerable Where() and ToList() - What do they really do?

Tags:

c#

.net

linq

I was wondering what exactly the Where() and ToList() methods are doing. Specifically I was wondering if the Where() will create a new object in memory or return a new object.

Ok, looking at the following code, say I have a skeleton log class.

public class Log()
{
    public string Log {get;set;}
    public string CreatedByUserId {get;set;}
    public string ModifiedUserId {get;set;}
}

In my business logic, say I only want logs created or modified by a certain user. This is going to be accomplished with a method: FilterLogsAccordingToUserId().

public IEnumerable<Log> FilterLogsAccordingToUserId(IEnumerable<Log> logs, string userId)
{
    int user = int.Parse(userId);
    return logs.Where(x => x.CreatedByUserId.Equals(user) ||
                           x.ModifiedByUserId.Equals(user)).ToList();
}

In this situation, is Where() modifying the IEnumerable<Log> by removing all objects that don't match the condition, or is it grabbing all objects, casting that object to a list in memory, and then return that new object?

If it is the second possibility, am I right to be concerned about performance if a sufficiently large list of logs is passed to the function?

like image 454
Alexander Matusiak Avatar asked Apr 15 '14 17:04

Alexander Matusiak


People also ask

What does ToList () do?

The tolist() function is used to convert a given array to an ordinary list with the same items, elements, or values.

Why do we use ToList () in C#?

The ToList<TSource>(IEnumerable<TSource>) method forces immediate query evaluation and returns a List<T> that contains the query results. You can append this method to your query in order to obtain a cached copy of the query results.

What is the use of IEnumerable?

IEnumerable is an interface defining a single method GetEnumerator() that returns an IEnumerator interface. It is the base interface for all non-generic collections that can be enumerated. This works for read-only access to a collection that implements that IEnumerable can be used with a foreach statement.

Why do we use IEnumerable in C#?

IEnumerable in C# is an interface that defines one method, GetEnumerator which returns an IEnumerator interface. This allows readonly access to a collection then a collection that implements IEnumerable can be used with a for-each statement.


1 Answers

Let's take the two methods separately.

Where

This one will return a new object, that when enumerated, will filter the original collection object by the predicate.

It will in no way change the original collection, but it will be linked to it.

It is also a deferred execution collection, which means that until you actually enumerated it, and every time you enumerate it, it will use the original collection and filter that.

This means that if you change the original collection, the filtered result of it will change accordingly.

Here is a simple LINQPad program that demonstrates:

void Main()
{
    var original = new List<int>(new[] { 1, 2, 3, 4 });
    var filtered = original.Where(i => i > 2);
    original.Add(5);
    filtered.Dump();
    original.Add(6);
    filtered.Dump();
}

Output:

LINQPad output #1

As you can see, adding more elements to the original collection that satisfies the filtering conditions of the second collection will make those elements appear in the filtered collection as well.

ToList

This will create a new list object, populate it with the collection, and return that collection.

This is an immediate method, meaning that once you have that list, it is now a completely separate list from the original collection.

Note that the objects in that list may still be shared with the original collection, the ToList method does not make new copies of all of those, but the collection is a new one.

Here is a simple LINQPad program that demonstrates:

void Main()
{
    var original = new List<int>(new[] { 1, 2, 3, 4 });
    var filtered = original.Where(i => i > 2).ToList();
    original.Add(5);

    original.Dump();
    filtered.Dump();
}

Output:

LINQPad output #2

Here you can see that once we've created that list, it doesn't change if the original collection changes.

You can think of the Where method as being linked to the original collection, whereas ToList will simply return a new list with the elements and not be linked to the original collection.

Now, let's look at your final question. Should you be worried about performance? Well, this is a rather large topic, but yes, you should be worried about performance, but not to such a degree that you do it all the time.

If you give a large collection to a Where call, every time you enumerate the results of the Where call, you will enumerate the original large collection and filter it. If the filter only allows for few of those elements to pass by it, it will still enumerate over the original large collection every time you enumerate it.

On the other hand, doing a ToList on something large will also create a large list.

Is this going to be a performance problem?

Who can tell, but for all things performance, here's my number 1 answer:

  1. First know that you have a problem
  2. Secondly measure your code using the appropriate (memory, cpu time, etc.) tool to figure out where the performance problem is
  3. Fix it
  4. Return to number 1

Too often you will see programmers fret over a piece of code, thinking it will incur a performance problem, only to be dwarfed by the slow user looking at the screen wondering what to do next, or by the download time of the data, or by the time it takes to write the data to disk, or what not.

First you know, then you fix.

like image 56
Lasse V. Karlsen Avatar answered Sep 22 '22 13:09

Lasse V. Karlsen