Speed Up Parsing Algorithm

Question

I'm trying to parse some ddump files, could you please help me speed up my algorithm?
It takes 216 ms for each loop!! that is way too much. I would like to have it around 40-50 ms per loop. Maybe by using RegExp?

Here is my algrithm:

 while (pos < EntireFile.Length && (/*curr = */EntireFile.Substring(pos, EntireFile.Length - pos)).Contains(" class"))
            {
                w.Reset();
                w.Start();
                pos = EntireFile.ToLower().IndexOf(" class", pos) + 6;
                int end11 = EntireFile.ToLower().IndexOf("extends", pos);
                if (end11 == -1)
                    end11 = EntireFile.IndexOf("
", pos);
                else
                {
                    int end22 = EntireFile.IndexOf("
", pos);
                    if (end22 < end11)
                        end11 = end22;
                }
                //string opcods = EntireFile.Substring(pos, EntireFile.Length - pos);
                string Cname = EntireFile.Substring(pos, end11 - pos).Trim();
                pos += (end11 - pos) + 7;
                pos = EntireFile.IndexOf("{", pos) +1;

            int count = 1;
            string searching = EntireFile.Substring(pos, EntireFile.Length - pos);
            int searched = 0;
            while (count != 0)
            {
                if (searching[searched] == '{')
                    count++;
                else if (searching[searched] == '}')
                    count--;

                searched++;
            }
            string Content = EntireFile.Substring(pos, searched);
            tlist.Add(new TClass() { ClassName = Cname, Content = Content });
            pos += searched;

            if (pos % 3 == 0)
            {
                double prc = ((double)pos) * 100d / ((double)EntireFile.Length);
                int prcc = (int)Math.Round(prc);
                wnd.UpdateStatus(prcc);
                wnd.Update();
            }
            mils.Add((int)w.ElapsedMilliseconds);
        }

Any help would be greatly appreciated.

Jon · Accepted Answer

Well, doing this multiple times

EntireFile.ToLower()

certainly will not help. There are several things you can do:

Perform costly operations (ToLower, IndexOf, etc) only once and cache the results if possible.
Do not narrow down on the input you are processing with SubString, this will kill your performance. Rather, keep a separate int parseStart value and use that as an additional parameter to all of your IndexOf calls. In other words, keep track of the part of the file you have parsed manually instead of taking a smaller substring each time.

John Leidegren · Answer

The performance problems you have are in large related to overhead from all the string copy operations.

There are overloads that let's you specify the valid range of your string operations if you eliminate the copying by simply using an index to virtually substring the entire string that will make a difference.

Also, case-insensitive comparison are not made by lowering or upping the string! You use the StringComparer class or StringComparsion enumeration. There are many string overloads that let's you specify whether to consider case-sensitivity.

Indexing a string repeatedly using the square bracket notation is also very expensive. If you look at the implementation of the string operations in .NET they always turn the search string into a char array because that's faster to work with. However, that means that a lot of copying is still taking place even for read only search operations.

Speed Up Parsing Algorithm

Tags:

c#

algorithm

parsing

refactoring

alex

2 Answers

Jon

John Leidegren

Recent Activity

Donate For Us

Speed Up Parsing Algorithm

Tags:

c#

algorithm

parsing

refactoring

alex

2 Answers

Jon

John Leidegren

Related questions

Recent Activity

Donate For Us