How can I specify the priority of a match pattern in a Regex?

Tags:

I'm writing a function-parsing engine which uses regular expressions to separate the individual terms (defined as a constant or a variable followed (optionally) by an operator). It's working great, except when I have grouped terms within other grouped terms. Here's the code I'm using:

//This matches an opening delimiter
Regex openers = new Regex("[\\[\\{\\(]");

//This matches a closing delimiter
Regex closers = new Regex("[\\]\\}\\)]");

//This matches the name of a variable (\w+) or a constant numeric value (\d+(\.\d+)?)
Regex VariableOrConstant = new Regex("((\\d+(\\.\\d+)?)|\\w+)" + FunctionTerm.opRegex + "?");

//This matches the binary operators +, *, -, or /
Regex ops = new Regex("[\\*\\+\\-/]");

//This compound Regex finds a single variable or constant term (including a proceeding operator,
//if any) OR a group containing multiple terms (and their proceeding operators, if any)
//and a proceeding operator, if any.
//Matches that match this second pattern need to be added to the function as sub-functions,
//not as individual terms, to ensure the correct evalutation order with parentheses.
Regex splitter = new Regex(
openers + 
"(" + VariableOrConstant + ")+" + closers + ops + "?" +
"|" +
"(" + VariableOrConstant + ")" + ops + "?");

When "splitter" is matched against the string "4/(2*X*[2+1])", the matches' values are "4/", "2*", "X*", "2+", and "1", completely ignoring all of the delimiting parentheses and braces. I believe this is because the second half of the "splitter" Regex (the part after the "|") is being matched and overriding the other part of the pattern. This is bad- I want grouped expressions to take precedence over single terms. Does anyone know how I can do this? I looked into using positive/negative lookaheads and lookbehinds, but I'm honestly not sure how to use those, or what they're even for, for that matter, and I can't find any relevant examples... Thanks in advance.

719

asked Dec 13 '10 01:12

Michael Hoffmann

2 Answers

You didn't show us how you're applying the regex, so here's a demo I whipped up:

private static void ParseIt(string subject)
{
  Console.WriteLine("subject : {0}\n", subject);

  Regex openers = new Regex(@"[[{(]");
  Regex closers = new Regex(@"[]})]");
  Regex ops = new Regex(@"[*+/-]");
  Regex VariableOrConstant = new Regex(@"((\d+(\.\d+)?)|\w+)" + ops + "?");

  Regex splitter = new Regex(
    openers + @"(?<FIRST>" + VariableOrConstant + @")+" + closers + ops + @"?" +
    @"|" +
    @"(?<SECOND>" + VariableOrConstant + @")" + ops + @"?",
    RegexOptions.ExplicitCapture
  );

  foreach (Match m in splitter.Matches(subject))
  {
    foreach (string s in splitter.GetGroupNames())
    {
      Console.WriteLine("group {0,-8}: {1}", s, m.Groups[s]);
    }
    Console.WriteLine();
  }
}

output:

subject : 4/(2*X*[2+1])

group 0       : 4/
group FIRST   :
group SECOND  : 4/

group 0       : 2*
group FIRST   :
group SECOND  : 2*

group 0       : X*
group FIRST   :
group SECOND  : X*

group 0       : [2+1]
group FIRST   : 1
group SECOND  :

As you can see, the term [2+1] is matched by the first part of the regex, as you intended. It can't do anything with the (, though, because the next bracketing character after that is another "opener" ([), and it's looking for a "closer".

You could use .NET's "balanced matching" feature to allow for grouped terms enclosed in other groups, but it's not worth the effort. Regexes are not designed for parsing--in fact, parsing and regex matching are fundamentally different kinds of operation. And this is a good example of the difference: a regex actively seeks out matches, skipping over anything it can't use (like the open-parenthesis in your example), but a parser has to examine every character (even if it's just to decide to ignore it).

About the demo: I tried to make the minimum functional changes necessary to get your code to work (which is why I didn't correct the error of putting the + outside the capturing group), but I also made several surface changes, and those represent active recommendations. To wit:

Always use verbatim string literals (@"...") when creating regexes in C# (I think the reason is obvious).
If you're using capturing groups, use named groups whenever possible, but don't use named groups and numbered groups in the same regex. Named groups save you the hassle of keeping track of what's captured where, and the ExplicitCapture option saves you having to clutter up the regex with (?:...) wherever you need a non-capturing group.

Finally, that whole scheme of building a large regex from a bunch of smaller regexes has very limited usefulness IMO. It's very difficult to keep track of the interactions between the parts, like which part's inside which group. Another advantage of C#'s verbatim strings is that they're multiline, so you can take advantage of free-spacing mode (a.k.a. /x or COMMENTS mode):

  Regex r = new Regex(@"
    (?<GROUPED>
      [[{(]                  # opening bracket
      (                      # group containing:
        ((\d+(\.\d+)?)|\w+)     # number or variable
        [*+/-]?                 # and proceeding operator
      )+                     # ...one or more times
      []})]                  # closing bracket
      [*+/-]?                # and proceeding operator
    )
    |
    (?<UNGROUPED>
      ((\d+(\.\d+)?)|\w+)    # number or variable
      [*+/-]?                # and proceeding operator
    )
    ",
    RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace
  );

This is not intended as a solution to your problem; as I said, that's not a job for regexes. This is just a demonstration of some useful regex techniques.

200

answered Nov 04 '22 03:11

Alan Moore

try using difrent quantifiers

greedy:

*  +  ?

possessive:

*+ ++ ?+

lazy:

*? +? ??

Try reading this and this

also maybe non-capturing group:

(?:your expr here)

try try try! practice make perfect! :)

answered Nov 04 '22 05:11

marverix

Related questions
                            
                                WindowLicker for .NET's WinForms?
                            
                                Does code contracts really help unit testing?
                            
                                Traverse a c# method and anazlye the method body
                            
                                Force C# application to use a single core in a PC with a multicore processor
                            
                                C#: Monitoring copied or moved files with FileSystemWatcher
                            
                                Best practice for securing username/password between clients and server
                            
                                Why does Enum.Parse create undefined entries?
                            
                                Custom CodeAccessSecurityAttribute
                            
                                C# StyleCop - Using "this." prefix for base class members like current class members or not?
                            
                                What's the equivalent of Visual Studio 2008 Object Test Bench in Visual Studio 2010?
                            
                                Should I develop my game idea in text mode first? [closed]
                            
                                C# Memory leak, tracking techinques and tools
                            
                                Limiting Starts or Run Time for Evaluation Software in C# and Windows
                            
                                Overhead of timer in application C#
                            
                                COMException (0x800A13E9) - Word interop services
                            
                                How to develop a web application that is load-balance friendly
                            
                                SoundPlayer causing Memory Leaks?
                            
                                Identify if an email address is 'public'
                            
                                How to specify form parameter when using webclient to upload file
                            
                                C#. How to programmatically select and copy text from the console application?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I specify the priority of a match pattern in a Regex?

Tags:

c#

.net

regex

parsing

Michael Hoffmann

People also ask

2 Answers

Alan Moore

marverix

Recent Activity

Donate For Us