Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does IEumerator<T> affect the state of IEnumerable<T> even the enumerator never reached the end?

I am curious why the following throws an error message (text reader closed exception) on the "last" assignment:

IEnumerable<string> textRows = File.ReadLines(sourceTextFileName);
IEnumerator<string> textEnumerator = textRows.GetEnumerator();

string first = textRows.First();
string last = textRows.Last();

However the following executes fine:

IEnumerable<string> textRows = File.ReadLines(sourceTextFileName);

string first = textRows.First();
string last = textRows.Last();

IEnumerator<string> textEnumerator = textRows.GetEnumerator();

What is the reason for the different behavior?

like image 568
Matt Avatar asked Dec 26 '12 10:12

Matt


People also ask

How does IEnumerator work in C#?

IEnumerable is an interface defining a single method GetEnumerator() that returns an IEnumerator interface. It is the base interface for all non-generic collections that can be enumerated. This works for read-only access to a collection that implements that IEnumerable can be used with a foreach statement.

Why do we need IEnumerable in C#?

IEnumerable is best to query data from in-memory collections like List, Array etc. IEnumerable doesn't support add or remove items from the list. Using IEnumerable we can find out the no of elements in the collection after iterating the collection. IEnumerable supports deferred execution.

Is IEnumerable an iterator?

IEnumerable is the return type from an iterator. An iterator is a method that uses the yield return keywords. yield return is different from a normal return statement because, while it does return a value from the function, it doesn't “close the book” on that function.


1 Answers

You've discovered a bug in the framework, as far as I can tell. It's reasonably subtle, because of the interaction of a few things:

  • When you call ReadLines(), the file is actually opened. Personally, I think of this as a bug in itself; I'd expect and hope that it would be lazy - only opening the file when you try to start iterating over it.
  • When you call GetEnumerator() the first time on the return value of ReadLines, it will actually return the same reference.
  • When First() calls GetEnumerator(), it will create a clone. This will share the same StreamReader as textEnumerator
  • When First() disposes its clone, it will dispose of the StreamReader, and set its variable to null. This doesn't affect the variable within the original, which now refers to a disposed StreamReader
  • When Last() calls GetEnumerator(), it will create a clone of the original object, complete with disposes StreamReader. It then tries to read from that reader, and throws an exception.

Now compare this with your second version:

  • When First() calls GetEnumerator(), the original reference is returned, complete with open reader.
  • When First() then calls Dispose(), the reader will be disposed and the variable set to null
  • When Last() calls GetEnumerator(), a clone will be created - but because the value it's cloning has a null reference, a new StreamReader is created, so it's able to read the file with no problems. It then disposes of the clone, which closes the reader
  • When GetEnumerator() is called, a second clone of the original object, opening yet another StreamReader - again, no problems there.

So basically, the problem in the first snippet is that you're calling GetEnumerator() a second time (in First()) without having disposed of the first object.

Here's another example of the same problem:

using System;
using System.IO;
using System.Linq;

class Test
{
    static void Main()
    {
        var lines = File.ReadLines("test.txt");
        var query = from x in lines
                    from y in lines
                    select x + "/" + y;
        foreach (var line in query)
        {
            Console.WriteLine(line);
        }
    }
}

You could fix this by calling File.ReadLines twice - or by using a genuinely lazy implementation of ReadLines, like this:

using System.IO;
using System.Linq;

class Test
{
    static void Main()
    {
        var lines = ReadLines("test.txt");
        var query = from x in lines
                    from y in lines
                    select x + "/" + y;
        foreach (var line in query)
        {
            Console.WriteLine(line);
        }
    }

    static IEnumerable<string> ReadLines(string file)
    {
        using (var reader = File.OpenText(file))
        {
            string line;
            while ((line = reader.ReadLine()) != null)
            {
                yield return line;
            }
        }
    }
}

In the latter code, a new StreamReader is opened each time GetEnumerator() is called - so the result is each pair of lines in test.txt.

like image 165
Jon Skeet Avatar answered Oct 09 '22 17:10

Jon Skeet