Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expressions slowing down the program

Tags:

c#

regex

parsing

I'm trying to create a program that parses data from game's chat log. So far I have managed to get the program to work and parse the data that I want but my problem is that the program is getting slower.

Currently it takes 5 seconds to parse a 10MB text file and I noticed it drops down to 3 seconds if I add RegexOptions.Compiled to my regex.

I believe I have pinpointed the problem to my regex matches. One line is currently read 5 times because of the 5 regexes so the program would get even slower when I add more later.

What should I do so my program would not slow down with multiple regexes? All suggestions to make the code better are appreciated!

if (sender.Equals(ButtonParse))
        {
            var totalShots = 0f;
            var totalHits = 0f;
            var misses = 0;
            var crits = 0;

            var regDmg = new Regex(@"(?<=\bSystem\b.* You inflicted )\d+.\d", RegexOptions.Compiled);
            var regMiss = new Regex(@"(?<=\bSystem\b.* Target evaded attack)", RegexOptions.Compiled);
            var regCrit = new Regex(@"(?<=\bSystem\b.* Critical hit - additional damage)", RegexOptions.Compiled);
            var regHeal = new Regex(@"(?<=\bSystem\b.* You healed yourself )\d+.\d", RegexOptions.Compiled);
            var regDmgrec = new Regex(@"(?<=\bSystem\b.* You take )\d+.\d", RegexOptions.Compiled);

            var dmgList = new List<float>(); //New list for damage values
            var healList = new List<float>(); //New list for heal values
            var dmgRecList = new List<float>(); //New list for damage received values

            using (var sr = new StreamReader(TextBox1.Text))
            {
                while (!sr.EndOfStream)
                {
                    var line = sr.ReadLine();

                    var match = regDmg.Match(line);
                    var match2 = regMiss.Match(line);
                    var match3 = regCrit.Match(line);
                    var match4 = regHeal.Match(line);
                    var match5 = regDmgrec.Match(line);

                    if (match.Success)
                    {
                        dmgList.Add(float.Parse(match.Value, CultureInfo.InvariantCulture));
                        totalShots++;
                        totalHits++;
                    }
                    if (match2.Success)
                    {
                        misses++;
                        totalShots++;
                    }
                    if (match3.Success)
                    {
                        crits++;
                    }
                    if (match4.Success)
                    {
                        healList.Add(float.Parse(match4.Value, CultureInfo.InvariantCulture));
                    }
                    if (match5.Success)
                    {
                        dmgRecList.Add(float.Parse(match5.Value, CultureInfo.InvariantCulture));
                    }
                }
                TextBlockTotalShots.Text = totalShots.ToString(); //Show total shots
                TextBlockTotalDmg.Text = dmgList.Sum().ToString("0.##"); //Show total damage inflicted

                TextBlockTotalHits.Text = totalHits.ToString(); //Show total hits
                var hitChance = totalHits / totalShots; //Calculate hit chance
                TextBlockHitChance.Text = hitChance.ToString("P"); //Show hit chance

                TextBlockTotalMiss.Text = misses.ToString(); //Show total misses
                var missChance = misses / totalShots; //Calculate miss chance
                TextBlockMissChance.Text = missChance.ToString("P"); //Show miss chance

                TextBlockTotalCrits.Text = crits.ToString(); //Show total crits
                var critChance = crits / totalShots; //Calculate crit chance
                TextBlockCritChance.Text = critChance.ToString("P"); //Show crit chance

                TextBlockDmgHealed.Text = healList.Sum().ToString("F1"); //Show damage healed

                TextBlockDmgReceived.Text = dmgRecList.Sum().ToString("F1"); //Show damage received

                var pedSpent = dmgList.Sum() / (float.Parse(TextBoxEco.Text, CultureInfo.InvariantCulture) * 100); //Calculate ped spent
                TextBlockPedSpent.Text = pedSpent.ToString("0.##") + " PED"; //Estimated ped spent
            }
        }

And here's a sample text:

2014-09-02 23:07:22 [System] [] You inflicted 45.2 points of damage.
2014-09-02 23:07:23 [System] [] You inflicted 45.4 points of damage.
2014-09-02 23:07:24 [System] [] Target evaded attack.
2014-09-02 23:07:25 [System] [] You inflicted 48.4 points of damage.
2014-09-02 23:07:26 [System] [] You inflicted 48.6 points of damage.
2014-10-15 12:39:55 [System] [] Target evaded attack.
2014-10-15 12:39:58 [System] [] You inflicted 56.0 points of damage.
2014-10-15 12:39:59 [System] [] You inflicted 74.6 points of damage.
2014-10-15 12:40:02 [System] [] You inflicted 78.6 points of damage.
2014-10-15 12:40:04 [System] [] Target evaded attack.
2014-10-15 12:40:06 [System] [] You inflicted 66.9 points of damage.
2014-10-15 12:40:08 [System] [] You inflicted 76.2 points of damage.
2014-10-15 12:40:12 [System] [] You take 18.4 points of damage.
2014-10-15 12:40:14 [System] [] You inflicted 76.1 points of damage.
2014-10-15 12:40:17 [System] [] You inflicted 88.5 points of damage.
2014-10-15 12:40:19 [System] [] You inflicted 69.0 points of damage.
2014-10-19 05:56:30 [System] [] Critical hit - additional damage! You inflict 275.4 points of damage.
2014-10-19 05:59:29 [System] [] You inflicted 92.8 points of damage.
2014-10-19 05:59:31 [System] [] Critical hit - additional damage! You inflict 251.5 points of damage.
2014-10-19 05:59:35 [System] [] You take 59.4 points of damage.
2014-10-19 05:59:39 [System] [] You healed yourself 84.0 points.
like image 290
S-T Avatar asked Nov 11 '14 22:11

S-T


People also ask

Why is my regex so slow?

The reason the regex is so slow is that the "*" quantifier is greedy by default, and so the first ". *" tries to match the whole string, and after that begins to backtrack character by character. The runtime is exponential in the count of numbers on a line.

Does regex affect performance?

Being more specific with your regular expressions, even if they become much longer, can make a world of difference in performance. The fewer characters you scan to determine the match, the faster your regexes will be.

Can regex be slow?

My experience shows that most of the time developers focus on correctness of a regex, leaving aside its performance. Yet matching a string with a regex can be surprisingly slow. So slow it can even stop any JS app or take 100% of a server CPU time causing denial of service (DOS).

How do you reduce a regular expression?

Regular expression simplification is a method for removing unnecessary elements from certain regular expressions in order to simplify, minimize or make it more readable by analyzing the patterns that make up the regex string. Feel free to edit this Q&A, review it or improve it!

Does compiling regex make it faster?

Regex has an interpreted mode and a compiled mode. The compiled mode takes longer to start, but is generally faster.


1 Answers

Here are the issues as I see it

  1. As suggested in the comments don't have the regex parser working way too much for a basic pattern situations.
  2. Why parse the data multiple times on the same text? Create one regex pattern to do all the work with one scan over each line.
  3. In WPF don't hold up the GUI thread to do work, do the work in a background task and update a viewmodel (you are using MVVM right?) which will propagate the info to the screen using INotifyPropertyChanged events.

The following is a one regex pattern solution which works on a line by line basis. Its first task is to verify that [System] is contained on the line. If it is not, it does no matching on that line. If it does have system, then it looks for specific keywords and possible values and places them into regex named match captures in a key/value pair situation.

Once that is done using linq it will sum up the values found. Note that I have commented the pattern and had the regex parser ignore it.

string pattern = @"^       # Beginning of line to anchor it.
(?=.+\[System\])           # Within the line a literal '[System]' has to occur
(?=.+                      # Somewhere within that line search for these keywords:
  (?<Action>               # Named Match Capture Group 'Action' will hold a keyword.
          inflicte?d?      # if the line has inflict or inflicted put it into 'Action'
          |                # or
          evaded           # evaded
          | take           # or take
          | yourself       # or yourself (heal)
   )
  (\s(?<Value>[\d.]+))?)   # if a value of points exist place into 'Value'
.+                         # match one or more to complete it.
$                          #end of line to stop on";

 // IgnorePatternWhiteSpace only allows us to comment the pattern. Does not affect processing.
var tokens =
   Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline)
        .OfType<Match>()
        .Select( mt => new {
                            Action = mt.Groups["Action"].Value,
                            Value  = mt.Groups["Value"].Success ? double.Parse(mt.Groups["Value"].Value) : 0,
                            Count  = 1,
                           })
         .GroupBy ( itm => itm.Action,  // Each action will be grouped into its name for summing
                    itm => itm,   // This is value to summed amongst the individual items of the group.
                    (action, values) => new
                            {
                                Action = action,
                                Count  = values.Sum (itm => itm.Count),
                                Total  = values.Sum(itm => itm.Value)
                             }
                         );

Result

The linq result returns each of the tokens as an entity which sums up all the values for the actions, but also counts up the number of times those actions occurred.

enter image description here

DATA

string data=@"2014-09-02 23:07:22 [System] [] You inflicted 45.2 points of damage.
2014-09-02 23:07:23 [System] [] You inflicted 45.4 points of damage.
2014-09-02 23:07:24 [System] [] Target evaded attack.
2014-09-02 23:07:25 [System] [] You inflicted 48.4 points of damage.
2014-09-02 23:07:26 [System] [] You inflicted 48.6 points of damage.
2014-10-15 12:39:55 [System] [] Target evaded attack.
2014-10-15 12:39:58 [System] [] You inflicted 56.0 points of damage.
2014-10-15 12:39:59 [System] [] You inflicted 74.6 points of damage.
2014-10-15 12:40:02 [System] [] You inflicted 78.6 points of damage.
2014-10-15 12:40:04 [System] [] Target evaded attack.
2014-10-15 12:40:06 [System] [] You inflicted 66.9 points of damage.
2014-10-15 12:40:08 [System] [] You inflicted 76.2 points of damage.
2014-10-15 12:40:12 [System] [] You take 18.4 points of damage.
2014-10-15 12:40:14 [System] [] You inflicted 76.1 points of damage.
2014-10-15 12:40:17 [System] [] You inflicted 88.5 points of damage.
2014-10-15 12:40:19 [System] [] You inflicted 69.0 points of damage.
2014-10-19 05:56:30 [System] [] Critical hit - additional damage! You inflict 275.4 points of damage.
2014-10-19 05:59:29 [System] [] You inflicted 92.8 points of damage.
2014-10-19 05:59:31 [System] [] Critical hit - additional damage! You inflict 251.5 points of damage.
2014-10-19 05:59:35 [System] [] You take 59.4 points of damage.
2014-10-19 05:59:39 [System] [] You healed yourself 84.0 points.";
like image 195
ΩmegaMan Avatar answered Oct 01 '22 19:10

ΩmegaMan