Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can A MatchCollection hang the program when trying to iterate it?

Tags:

c#

regex

I have a code example where a MatchCollection seems to hang the program when trying to use it with foreach.

I am parsing css using a class CSSParser:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
using Helpers.Extensions;

namespace Helpers.Utils
{
    public class CSSParser
    {
        private readonly Dictionary<string, Dictionary<string, string>>
            _dict = new Dictionary<string, Dictionary<string, string>>();

        private const string SelectorKey = "selector";
        private const string NameKey = "name";
        private const string ValueKey = "value";

        private const string GroupsPattern
            = @"(?<selector>(?:(?:[^,{]+)\s*,?\s*)+)\{(?:(?<name>[^}:]+)\s*:\s*(?<value>[^};]+);?\s*)*\}";

        private const string CommentsPattern
            = @"(?<!"")\/\*.+?\*\/(?!"")";

        private readonly Regex _pattern
            = new Regex(GroupsPattern, RegexOptions.IgnoreCase | RegexOptions.Multiline);

        public CSSParser(string cssString)
        {
            var noCommentsString = Regex.Replace(cssString, CommentsPattern, "");
            var matches = _pattern.Matches(noCommentsString);

            foreach (Match item in matches)
            {
                var selector = item.Groups[SelectorKey].Captures[0].Value.Trim();

                var selectorParts = selector.Split(',').Select(s=>s.Trim());
                foreach(var part in selectorParts)
                {
                    if (!_dict.ContainsKey(part))
                      _dict[part] = new Dictionary<string, string>();
                }

                var classNameCaptures = item.Groups[NameKey].Captures;
                var valueCaptures = item.Groups[ValueKey].Captures;

                var count = item.Groups[NameKey].Captures.Count;

                for (var i = 0; i < count; i++)
                {
                    var className = classNameCaptures[i].Value.TrimIfNotNull();
                    var value = valueCaptures[i].Value.TrimIfNotNull();

                    foreach(var part in selectorParts)
                    {
                        _dict[part][className] = value;
                    }
                }
            }
        }

        public IEnumerable<KeyValuePair<string,string>> LookupValues(string selector)
        {
            IEnumerable<KeyValuePair<string,string>> result
                = new KeyValuePair<string,string>[]{};
            if (_dict.ContainsKey(selector))
            {
                var subdict = _dict[selector];

                result = subdict.ToList();
            }

            return result;
        }

        public string LookupValue(string selector, string style)
        {
            string result = null;
            if (_dict.ContainsKey(selector))
            {
                var subdict = _dict[selector];

                if (subdict.ContainsKey(style))
                    result = subdict[style];
            }

            return result;
        }
    }
}

and it works fine with input like this:

        [TestMethod]
        public void TestParseMultipleElementNames()
        {
          const string css = @"h1, h2, h3, h4, h5, h6
{
    font-family: Georgia, 'Times New Roman', serif;
    color: #006633;
    line-height: 1.2em;
    font-weight: normal;
}
";

          var parser = new CSSParser(css);

          Assert.AreEqual("normal", parser.LookupValue("h4", "font-weight"));
        }

but when I run it with a css string containing no attributes:

        [TestMethod]
        public void TestParseNoAttributesStyle()
        {
          const string css = @"
#submenu-container
{
}
";

          var parser = new CSSParser(css);

          Assert.IsFalse(parser.LookupValues("#submenu-container").Any());
        }

the program is hanging on this line in CSSParser:

foreach (Match item in matches)

The debugger stops marking the currently executing line, the loop block itself is never reached.

Why is the MatchCollection hanging my program?

For completeness:

namespace Helpers.Extensions
{
  public static class StringExtension
  {
    public static string TrimIfNotNull(this string input)
    {
      return input != null ? input.Trim() : null;
    }
  }
}
like image 315
Anders Lindén Avatar asked Aug 02 '13 12:08

Anders Lindén


2 Answers

Your Regex is just inefficient and burning CPU. You can confirm this by a) looking at the CPU time used and b) pausing the debugger repeatedly and looking at the stack (will be in the bowels of the Regex engine).

like image 75
usr Avatar answered Nov 18 '22 20:11

usr


As far as I can tell .net goes into an eternal loop because it tries different approaches with the regex you've got (the GroupsPattern one) - I believe it makes a mistake somewhere. I've had a look at this regex and as far as I can tell you can easily remove two of the \s*, namely those that come before the negation groups respectively [^,{]+ and [^}:]+ since they already capture spaces.

So that is, instead of:

private const string GroupsPattern = @"(?<selector>(?:(?:[^,{]+)\s*,?\s*)+)\{(?:(?<name>[^}:]+)\s*:\s*(?<value>[^};]+);?\s*)*\}";

I'd have:

private const string GroupsPattern = @"(?<selector>(?:(?:[^,{]+),?\s*)+)\{(?:(?<name>[^}:]+):\s*(?<value>[^};]+);?\s*)*\}";

Now this is regex expressions, so the chances of me having overlooked something are rather large. Also I believe this also results in some of the named capture groups to perhaps have extra spaces in them (but it seems you trim them anyway).

Hope it is usable. Although it still takes quite some time it works with the example you gave.

like image 37
Ykok Avatar answered Nov 18 '22 21:11

Ykok