Scenario - 150MB text file which is the exported Inbox of an old email account. Need to parse through and pull out emails from a specific user and writes these to a new, single file. I have code that works, its just dogged slow.
I'm using marker strings to search for where to begin/end the copy from the original file.
Here's the main function:
StreamReader sr = new StreamReader("c:\\Thunderbird_Inbox.txt");
string working = string.Empty;
string mystring = string.Empty;
while (!sr.EndOfStream)
{
while ((mystring = sr.ReadLine()) != null)
{
if (mystring == strBeginMarker)
{
writeLog(mystring);
//read the next line
working = sr.ReadLine();
while( !(working.StartsWith(strEndMarker)))
{
writeLog(working);
working = sr.ReadLine();
}
}
}
}
this.Text = "DONE!!";
sr.Close();
The function that writes the selected messages to the new file:
public void writeLog(string sMessage)
{
fw = new System.IO.StreamWriter(path, true);
fw.WriteLine(sMessage);
fw.Flush();
fw.Close();
}
Again, this process works. I get a good output file, it just takes a long time and I'm sure there are ways to make this faster.
The largest optimization would be to change your writeLog method to open the file once at the beginning of this operation, write to it many times, then close it at the end.
Right now, you're opening and closing the file each iteration where you write, which is going to definitely slow things down.
Try the following:
// Open this once at the beginning!
using(fw = new System.IO.StreamWriter(path, true))
{
using(StreamReader sr = new StreamReader("c:\\Thunderbird_Inbox.txt"))
{
string working;
string mystring;
while ((mystring = sr.ReadLine()) != null)
{
if (mystring == strBeginMarker)
{
writeLog(mystring);
//read the next line
working = sr.ReadLine();
while( !(working.StartsWith(strEndMarker)))
{
fw.WriteLine(working);
working = sr.ReadLine();
}
}
}
}
}
this.Text = "DONE!!";
I think you should:
I would just do a simple parser. Note that this assumes (as you do in your code above) that the markers are in fact unique.
You may have to play with the formatting a bit of your output, but here is the general idea:
// Read the entire file and close it
using (StreamReader sr = new
StreamReader("c:\\Thunderbird_Inbox.txt");)
{
string data = sr.ReadToEnd();
}
string newData = "";
int position = data.IndexOf(strBeginMarker);
while (position > 0)
{
int endPosition = data.IndexOf(endMarker, position);
int markerLength = position + strBeginMarker.Length;
newData += data.Substring(markerLength, endPosition - markerLength);
position = data.IndexOf(strBeginMarker, position+ endStr.Length);
}
writeLog(newData);
(Note that I don't have a 150 MB file to test this on - YMMV depending on the machine you are using).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With