Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Enumerating List faster than IList, ICollection and IEnumerable

Tags:

c#

list

Recently I have been researching some conventions in writing functions that return a collection. I was wondering whether the function that actually uses a List<int> should return a List<int> or rather IList<int>, ICollection<int> or IEnumerable<int>. I created some tests for performance and I was quite surprised with the results.

static List<int> list = MakeList();
static IList<int> iList = MakeList();
static ICollection<int> iCollection = MakeList();
static IEnumerable<int> iEnumerable = MakeList();

public static TimeSpan Measure(Action f)
{
    var stopWatch = new Stopwatch();
    stopWatch.Start();
    f();
    stopWatch.Stop();
    return stopWatch.Elapsed;
}

public static List<int> MakeList()
{
    var list = new List<int>();
    for (int i = 0; i < 100; ++i)
    {
        list.Add(i);
    }
    return list;
}

public static void Main()
{
    var time1 = Measure(() => { // Measure time of enumerating List<int>
        for (int i = 1000000; i > 0; i-- ) {
            foreach (var item in list)
            {
                var x = item;
            }
        }
    });
    Console.WriteLine($"List<int> time: {time1}");

    var time2 = Measure(() => { // IList<int>
        for (int i = 1000000; i > 0; i-- ) {
            foreach (var item in iList)
            {
                var x = item;
            }
        }
    });
    Console.WriteLine($"IList<int> time: {time2}");

    var time3 = Measure(() => { // ICollection<int>
        for (int i = 1000000; i > 0; i-- ) {
            foreach (var item in iCollection)
            {
                var x = item;
            }
        }
    });
    Console.WriteLine($"ICollection<int> time: {time3}");

    var time4 = Measure(() => { // IEnumerable<int>
        for (int i = 1000000; i > 0; i-- ) {
            foreach (var item in iEnumerable)
            {
                var x = item;
            }
        }
    });
    Console.WriteLine($"IEnumerable<int> time: {time4}");
}

Output:

List<int> time: 00:00:00.7976577
IList<int> time: 00:00:01.5599382
ICollection<int> time: 00:00:01.7323919
IEnumerable<int> time: 00:00:01.6075277

I've tried different order of the measures or making the MakeList() return one of the above interfaces but all of it only confirms that returning a List<int> and processing it as a List<int> is about twice as fast as with the interfaces.

However various sources, including this answer claim that you should never return List<> and always use an interface.

So my question is:

  • Why is processing a List<int> about twice as fast as the interfaces?
  • What should we return from a function and how to manage the code if we care about performance?
like image 977
Rasmond Avatar asked Oct 16 '19 10:10

Rasmond


People also ask

Is List faster than IEnumerable?

IEnumerable is conceptually faster than List because of the deferred execution. Deferred execution makes IEnumerable faster because it only gets the data when needed. Contrary to Lists having the data in-memory all the time.

Which is better IEnumerable or List?

IEnumerable is best to query data from in-memory collections like List, Array etc. IEnumerable doesn't support add or remove items from the list. Using IEnumerable we can find out the no of elements in the collection after iterating the collection.

Should I use IList or IEnumerable?

You use IEnumerable when you want to loop through the items in a collection. IList is when you want to add, remove, and access the list contents out of order. Save this answer.

What is the difference between ICollection and IEnumerable?

An ICollection is another type of collection, which derives from IEnumerable and extends it's functionality to modify (Add or Update or Remove) data. ICollection also holds the count of elements in it and we does not need to iterate over all elements to get total number of elements.


1 Answers

Why is processing a List<int> about twice as fast as the interfaces?

Great question. When attempting to foreach something, C# first checks to see whether the type of the collection already has a method called GetEnumerator that returns a type that has MoveNext and Current. If it does, it calls those directly. If not, then it falls back to using IEnumerable<T> or IEnumerable and IEnumerator<T> or IEnumerator to get the enumerator so that it can call MoveNext and Current.

This design choice was made for two reasons. First, in the world of C# 1.0 before generics, it meant that you could call a Current that returned int; IEnumerator.Current of course is object and so would box the int, which is both a speed and a memory penalty. Second, it meant that authors of collections could do experiments to figure out which implementation of MoveNext and Current had the best performance.

The implementors of List<T> did exactly that; if you examine GetEnumerator on List<T> you will discover something interesting: it returns a mutable value type. Yes, mutable value types are considered an easily-abused bad practice. But because 99.999% of the usages of this overload of GetEnumerator are called on your behalf by foreach, the vast majority of time you never even notice that there's a mutable value for you to abuse, and therefore do not abuse it.

(NOTE: The takeaway of the preceding paragraph should not be "use mutable value types because they are fast". The takeaway should be understand the usage patterns of your users and then design a safe, efficient tool that meets their needs. Usually a mutable value type is not the right tool.)

Anyways, long story short, we avoid all kinds of virtual calls, interface type checks, and so on, by binding directly to methods on mutable value types when iterating something known at compile time to be List<T>.

What should we return from a function and how to manage the code if we care about performance?

If you care about speed performance then you should be concentrating on the slowest thing in the program. Is the slowest thing in your program calling MoveNext on a collection? If so, congratulations, you have a very fast program; MoveNext is the next thing to optimize. But in this case really you should be asking "how do I avoid or delay this loop entirely?" if you're in that boat.

If MoveNext is not the slowest thing in the program then who cares if it is a few nanoseconds slower in a particular implementation? Return the type that is logically the one closest to what the caller wants and needs and don't worry about the tiny penalty.

like image 171
Eric Lippert Avatar answered Sep 23 '22 09:09

Eric Lippert