Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to write a lot of lines to a text file

I started off doing something as follows:

using (TextWriter textWriter = new StreamWriter(filePath, append))
{
    foreach (MyClassA myClassA in myClassAs)
    {
        textWriter.WriteLine(myIO.GetCharArray(myClassA));

        if (myClassA.MyClassBs != null)
            myClassA.MyClassBs.ToList()
                .ForEach(myClassB =>
                    textWriter.WriteLine(myIO.GetCharArray((myClassB)));

        if (myClassA.MyClassCs != null)
            myClassA.MyClassCs.ToList()
                .ForEach(myClassC =>
                    textWriter.WriteLine(myIO.GetCharArray(myClassC)));
    }
}

This seemed pretty slow (~35 seconds for 35,000 lines).

Then I tried to follow the example here to create a buffer, with the following code, but it didn't gain me anything. I was still seeing times around 35 seconds. Is there an error in how I implemented the buffer?

using (TextWriter textWriter = new StreamWriter(filePath, append))
{
    char[] newLineChars = Environment.NewLine.ToCharArray();
    //Chunk through 10 lines at a time.
    int bufferSize = 500 * (RECORD_SIZE + newLineChars.Count());
    char[] buffer = new char[bufferSize];
    int recordLineSize = RECORD_SIZE + newLineChars.Count();
    int bufferIndex = 0;

    foreach (MyClassA myClassA in myClassAs)
    {
        IEnumerable<IMyClass> myClasses =
            new List<IMyClass> { myClassA }
                .Union(myClassA.MyClassBs)
                .Union(myClassA.MyClassCs);

        foreach (IMyClass myClass in myClasses)
        {
            Array.Copy(myIO.GetCharArray(myClass).Concat(newLineChars).ToArray(),
                0, buffer, bufferIndex, recordLineSize);

            bufferIndex += recordLineSize;

            if (bufferIndex >= bufferSize)
            {
                textWriter.Write(buffer);

                bufferIndex = 0;
            }
        }
    }

    if (bufferIndex > 0)
        textWriter.Write(buffer);
}

Is there a better way to accomplish this?

like image 603
lintmouse Avatar asked Jun 26 '13 16:06

lintmouse


2 Answers

I strongly suspect that the majority of your time is not spent in the I/O. There's no way that it should take 35 seconds to write 35,000 lines, unless those lines are really long.

Most likely, the majority of time is spent in the GetCharArray method, whatever that does.

A few suggestions:

If you really think I/O is the problem, increase the stream's buffer size. Call the StreamWriter constructor that lets you specify a buffer size. For example,

using (TextWriter textWriter = new StreamWriter(filePath, append, Encoding.Utf8, 65536))

That'll perform better than the default 4K buffer size. Going higher than 64K for the buffer size is not generally useful, and can actually decrease performance.

Don't pre-buffer lines or append to a StringBuilder. That might give you small performance increases, but at a huge cost in complexity. The small performance boost isn't worth the maintenance nightmare.

Take advantage of foreach. You have this code:

if (myClassA.MyClassBs != null)
    myClassA.MyClassBs.ToList()
        .ForEach(myClassB =>
            textWriter.WriteLine(myIO.GetCharArray((myClassB)));

That has to create a concrete list from whatever MyClassBs collection is, and then enumerate it. Why not just enumerate the thing directly:

if (myClassA.MyClassBs != null)
{
    foreach (var myClassB in myClassA.MyClassBs)
    {
        textWriter.WriteLine(myIO.GetCharArray((myClassB)));
    }
}

That will save you the memory required by the ToList, and the time it takes to enumerate the collection when creating the list.

All that said, it's almost certain that your GetCharArray method is the thing that's taking all the time. If you really want to speed up your program, look there. Trying to optimize writing to the StreamWriter is a waste of time. You're not going to get significant performance increases there.

like image 68
Jim Mischel Avatar answered Nov 18 '22 01:11

Jim Mischel


I threw together a simple snippet that I think is a bit cleaner; but, then again, I'm not quite sure what you are trying to accomplish. Also, I don't have any of your classes available, so I can't really do any kind of tests.

This sample does basically the same thing you have; except that it uses some generic methods, and it does all the writing in one spot.

string filePath = "MickeyMouse.txt";
bool append = false;
List<MyClassA> myClassAs = new List<MyClassA> { new MyClassA() };
    List<char[]> outputLines = new List<char[]>();

foreach (MyClassA myClassA in myClassAs)
{
    outputLines.Add(myIO.GetCharArray(myClassA));

    if (myClassA.MyClassBs != null)
        outputLines.AddRange(myClassA.MyClassBs.Select(myClassB => myIO.GetCharArray(myClassB)));

    if (myClassA.MyClassCs != null)
        outputLines.AddRange(myClassA.MyClassCs.Select(myClassC => myIO.GetCharArray(myClassC)));
}

var lines = outputLines.Select(line => string.Concat<char>(line));
if (append)
    File.AppendAllLines(filePath, lines);
else
    File.WriteAllLines(filePath, lines);

Here's the StringBuilder version:

string filePath = "MickeyMouse.txt";
bool append = false;
List<MyClassA> myClassAs = new List<MyClassA> { new MyClassA() };
StringBuilder outputLines = new StringBuilder();

foreach (MyClassA myClassA in myClassAs)
{
    outputLines.Append(myIO.GetCharArray(myClassA));

    if (myClassA.MyClassBs != null)
        myClassA.MyClassBs.ForEach(myClassB=>outputLines.Append(myClassB));

    if (myClassA.MyClassCs != null)
        myClassA.MyClassCs.ForEach(myClassC => outputLines.Append(myClassC));
}

if (append)
    File.AppendAllText(filePath, outputLines.ToString());
else
    File.WriteAllText(filePath, outputLines.ToString());
like image 31
John Kraft Avatar answered Nov 18 '22 02:11

John Kraft