I have a rather large file that I wish to read from a particular line. I found
File.ReadLines(file).Skip(numLines);
which works great. However, I do not understand how this works underneath the surface. I wrote a couple of basic benchmarks to see if there was a performance difference from the way some colleagues had suggested. The methods I tested were:
StreamReader used to read through all lines up to that point:
public string streamToLine(int lineNumber)
{
StreamReader reader = new StreamReader(fileName);
for (int i = 0; i < lineNumber - 1; i++)
{
reader.ReadLine();
}
string line = reader.ReadLine();
reader.Close();
return line;
}
File.ReadLines(file) and iterating to the line with an enumerator:
public string readToLine(int lineNumber)
{
IEnumerator<string> lines = File.ReadLines(fileName).GetEnumerator();
for (int i = 0; i < lineNumber; i++)
{
lines.MoveNext();
}
return lines.ToString();
}
Using the Skip functionality:
public string skipToLine(int lineNumber)
{
IEnumerator<string> lines = File.ReadLines(fileName).Skip(lineNumber-1).GetEnumerator();
return lines.ToString();
}
I ran the tests 10 times over a file with 10 million lines, attempting to read the 9 millionth line and averaged how long this took in milliseconds:
Stream To Line : 2442.1
Read To Line : 2534.9
Skip To Line : 0
It looks like Skip does not even consider the other lines before lineNumber and knows exactly where the 9 millionth line is. Does it somehow infer this from the file? Is there some overhead in the way the other 2 methods process the lines because they are returning what is read? How is there such a big difference?
Basically, the problem is your test. You haven't called MoveNext()
on the enumerator, so it hasn't done anything yet. Iterators are often deferred and streaming, especially in the case of LINQ.
Incidentally, it is very rare that you need to use GetEnumerator()
; the idiomatic way to access such data is via foreach
.
If you want to see this in action:
static void Main()
{
using(var iter = GetData().GetEnumerator())
{
System.Console.WriteLine("Have iterator");
while(iter.MoveNext())
{
System.Console.WriteLine(iter.Current);
}
System.Console.WriteLine("Done");
}
}
static IEnumerable<int> GetData()
{
System.Console.WriteLine("Before doing anything");
yield return 1;
yield return 2;
yield return 3;
System.Console.WriteLine("Ater doing everything ");
}
You should notice that "Have iterator"
is written before "Before doing anything"
, which tells us that one can have an iterator that hasn't done anything yet. It is the first MoveNext()
that makes it print.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With