I'm trying to parse some ddump files, could you please help me speed up my algorithm?
It takes 216 ms for each loop!! that is way too much. I would like to have it around 40-50 ms per loop. Maybe by using RegExp?
Here is my algrithm:
while (pos < EntireFile.Length && (/*curr = */EntireFile.Substring(pos, EntireFile.Length - pos)).Contains(" class"))
{
w.Reset();
w.Start();
pos = EntireFile.ToLower().IndexOf(" class", pos) + 6;
int end11 = EntireFile.ToLower().IndexOf("extends", pos);
if (end11 == -1)
end11 = EntireFile.IndexOf("\r\n", pos);
else
{
int end22 = EntireFile.IndexOf("\r\n", pos);
if (end22 < end11)
end11 = end22;
}
//string opcods = EntireFile.Substring(pos, EntireFile.Length - pos);
string Cname = EntireFile.Substring(pos, end11 - pos).Trim();
pos += (end11 - pos) + 7;
pos = EntireFile.IndexOf("{", pos) +1;
int count = 1;
string searching = EntireFile.Substring(pos, EntireFile.Length - pos);
int searched = 0;
while (count != 0)
{
if (searching[searched] == '{')
count++;
else if (searching[searched] == '}')
count--;
searched++;
}
string Content = EntireFile.Substring(pos, searched);
tlist.Add(new TClass() { ClassName = Cname, Content = Content });
pos += searched;
if (pos % 3 == 0)
{
double prc = ((double)pos) * 100d / ((double)EntireFile.Length);
int prcc = (int)Math.Round(prc);
wnd.UpdateStatus(prcc);
wnd.Update();
}
mils.Add((int)w.ElapsedMilliseconds);
}
Any help would be greatly appreciated.
Well, doing this multiple times
EntireFile.ToLower()
certainly will not help. There are several things you can do:
ToLower
, IndexOf
, etc) only once and cache the results if possible.SubString
, this will kill your performance. Rather, keep a separate int parseStart
value and use that as an additional parameter to all of your IndexOf
calls. In other words, keep track of the part of the file you have parsed manually instead of taking a smaller substring each time.The performance problems you have are in large related to overhead from all the string copy operations.
There are overloads that let's you specify the valid range of your string operations if you eliminate the copying by simply using an index to virtually substring the entire string that will make a difference.
Also, case-insensitive comparison are not made by lowering or upping the string! You use the StringComparer
class or StringComparsion
enumeration. There are many string overloads that let's you specify whether to consider case-sensitivity.
Indexing a string repeatedly using the square bracket notation is also very expensive. If you look at the implementation of the string operations in .NET they always turn the search string into a char array because that's faster to work with. However, that means that a lot of copying is still taking place even for read only search operations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With