Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed Up Parsing Algorithm

I'm trying to parse some ddump files, could you please help me speed up my algorithm?
It takes 216 ms for each loop!! that is way too much. I would like to have it around 40-50 ms per loop. Maybe by using RegExp?

Here is my algrithm:

 while (pos < EntireFile.Length && (/*curr = */EntireFile.Substring(pos, EntireFile.Length - pos)).Contains(" class"))
            {
                w.Reset();
                w.Start();
                pos = EntireFile.ToLower().IndexOf(" class", pos) + 6;
                int end11 = EntireFile.ToLower().IndexOf("extends", pos);
                if (end11 == -1)
                    end11 = EntireFile.IndexOf("\r\n", pos);
                else
                {
                    int end22 = EntireFile.IndexOf("\r\n", pos);
                    if (end22 < end11)
                        end11 = end22;
                }
                //string opcods = EntireFile.Substring(pos, EntireFile.Length - pos);
                string Cname = EntireFile.Substring(pos, end11 - pos).Trim();
                pos += (end11 - pos) + 7;
                pos = EntireFile.IndexOf("{", pos) +1;

            int count = 1;
            string searching = EntireFile.Substring(pos, EntireFile.Length - pos);
            int searched = 0;
            while (count != 0)
            {
                if (searching[searched] == '{')
                    count++;
                else if (searching[searched] == '}')
                    count--;

                searched++;
            }
            string Content = EntireFile.Substring(pos, searched);
            tlist.Add(new TClass() { ClassName = Cname, Content = Content });
            pos += searched;

            if (pos % 3 == 0)
            {
                double prc = ((double)pos) * 100d / ((double)EntireFile.Length);
                int prcc = (int)Math.Round(prc);
                wnd.UpdateStatus(prcc);
                wnd.Update();
            }
            mils.Add((int)w.ElapsedMilliseconds);
        }

Any help would be greatly appreciated.

like image 442
alex Avatar asked Dec 12 '22 15:12

alex


2 Answers

Well, doing this multiple times

EntireFile.ToLower()

certainly will not help. There are several things you can do:

  1. Perform costly operations (ToLower, IndexOf, etc) only once and cache the results if possible.
  2. Do not narrow down on the input you are processing with SubString, this will kill your performance. Rather, keep a separate int parseStart value and use that as an additional parameter to all of your IndexOf calls. In other words, keep track of the part of the file you have parsed manually instead of taking a smaller substring each time.
like image 60
Jon Avatar answered Dec 24 '22 17:12

Jon


The performance problems you have are in large related to overhead from all the string copy operations.

There are overloads that let's you specify the valid range of your string operations if you eliminate the copying by simply using an index to virtually substring the entire string that will make a difference.

Also, case-insensitive comparison are not made by lowering or upping the string! You use the StringComparer class or StringComparsion enumeration. There are many string overloads that let's you specify whether to consider case-sensitivity.

Indexing a string repeatedly using the square bracket notation is also very expensive. If you look at the implementation of the string operations in .NET they always turn the search string into a char array because that's faster to work with. However, that means that a lot of copying is still taking place even for read only search operations.

like image 45
John Leidegren Avatar answered Dec 24 '22 17:12

John Leidegren