Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimize C# file IO

Scenario - 150MB text file which is the exported Inbox of an old email account. Need to parse through and pull out emails from a specific user and writes these to a new, single file. I have code that works, its just dogged slow.

I'm using marker strings to search for where to begin/end the copy from the original file.

Here's the main function:

 StreamReader sr = new StreamReader("c:\\Thunderbird_Inbox.txt");
        string working = string.Empty;
        string mystring = string.Empty;
        while (!sr.EndOfStream)
        {
            while ((mystring = sr.ReadLine()) != null)
            {
                if (mystring == strBeginMarker)
                {
                    writeLog(mystring);

                    //read the next line
                    working = sr.ReadLine();

                        while( !(working.StartsWith(strEndMarker)))
                        {
                            writeLog(working);
                            working = sr.ReadLine();

                        }
                  }
            }

        }
        this.Text = "DONE!!";
        sr.Close();

The function that writes the selected messages to the new file:

  public void writeLog(string sMessage)
    {
            fw = new System.IO.StreamWriter(path, true);
            fw.WriteLine(sMessage);
            fw.Flush();
            fw.Close();
    }

Again, this process works. I get a good output file, it just takes a long time and I'm sure there are ways to make this faster.

like image 620
paparush Avatar asked Jan 20 '11 19:01

paparush


3 Answers

The largest optimization would be to change your writeLog method to open the file once at the beginning of this operation, write to it many times, then close it at the end.

Right now, you're opening and closing the file each iteration where you write, which is going to definitely slow things down.

Try the following:

// Open this once at the beginning!
using(fw = new System.IO.StreamWriter(path, true))
{
    using(StreamReader sr = new StreamReader("c:\\Thunderbird_Inbox.txt"))
    {
        string working;
        string mystring;
        while ((mystring = sr.ReadLine()) != null)
        {
           if (mystring == strBeginMarker)
           {
                writeLog(mystring);

                //read the next line
                working = sr.ReadLine();

                while( !(working.StartsWith(strEndMarker)))
                {
                    fw.WriteLine(working);
                    working = sr.ReadLine();
                }
            }
        }
    }
}
this.Text = "DONE!!";
like image 52
Reed Copsey Avatar answered Nov 18 '22 07:11

Reed Copsey


I think you should:

  1. Open files once.
  2. Load source file in memory.
  3. Break it and use several threads for processing.
like image 42
acoolaum Avatar answered Nov 18 '22 06:11

acoolaum


I would just do a simple parser. Note that this assumes (as you do in your code above) that the markers are in fact unique.

You may have to play with the formatting a bit of your output, but here is the general idea:

   // Read the entire file and close it
   using (StreamReader sr = new
   StreamReader("c:\\Thunderbird_Inbox.txt");)
   {
       string data = sr.ReadToEnd();   
   }

   string newData = "";   
   int position = data.IndexOf(strBeginMarker);

   while (position > 0)   
   {
      int endPosition = data.IndexOf(endMarker, position);
      int markerLength = position + strBeginMarker.Length;

     newData += data.Substring(markerLength, endPosition - markerLength);

     position = data.IndexOf(strBeginMarker, position+ endStr.Length);   
   }

  writeLog(newData);

(Note that I don't have a 150 MB file to test this on - YMMV depending on the machine you are using).

like image 2
Wonko the Sane Avatar answered Nov 18 '22 07:11

Wonko the Sane