Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Controlling balanced parenthesis

Tags:

c#

.net

regex

Assuming we have a string containing different parentheses (by parentheses in this context I mean (, [, and { ) the string could also contain other content like so {[balanced(parenthesis)]} but the rest of the content will largely be ignored. Would it be possible to use regular expression to control that all different parenthesis are:

  1. "closed", that is that each "opening" parenthesis is matched by a "closing" parenthesis, and
  2. at the right position in the string?
like image 429
Overly Excessive Avatar asked Oct 22 '14 07:10

Overly Excessive


2 Answers

It turns out this problem has been solved by Kobi in this blog post of his. For simply balancing different types of brackets, the solution is quite simple and elegant, and not as crazy as the answers in this question, which has a more complex grammar.

Let me quote the essential part of the blog post below:

[...] who says I must push what I matched to the stack? What if wanted to push an arbitrary string to the stack? When I see an open bracket I really want to push a closing bracket – but how can I?

The trick is to use a look-ahead: find the next occurrence of the character I want to push, and push it:

{(?=.*?(<Stack>}))

Next, when I want to match a closing brace, I already have the right one in stack. Using this approach, here’s a regex that matches tokens with matching balanced parentheses of 3 different typed:

(
    [^(){}\[\]]+
    | \( (?=[^)]*  (?<Stack> \) ) )
    | \[ (?=[^\]]* (?<Stack> \] ) )
    | \{ (?=[^}]*  (?<Stack> \} ) )
    | \k<Stack> (?<-Stack>)
)+?
(?(Stack) (?!))

Of course, this approach does have limitations – you may not find the character you’d like to push (which might be a good thing – it allows you to fail early). It also gets much trickier if you want to balance something more complicated than constant strings, but that’s another subject.

like image 57
2 revs Avatar answered Nov 16 '22 20:11

2 revs


It is possible to do this with the regex below, albeit a bit ugly.

This is a matching regex rather than a validation regex, but you can add anchor to turn it into a validation one by adding anchors at the beginning and the end.

(?:
    (?>[^(){}\[\]]+)
    |
    (?<open>
      (?=(?<mr>\())(?<mc>)(?<ms>)\(|
      (?=(?<mc>\{))(?<mr>)(?<ms>)\{|
      (?=(?<ms>\[))(?<mc>)(?<mr>)\[
    )
    |
    (?:
        (?<-open>
          (?!\k<mc>)\}
          |
          (?!\k<mr>)\)
          |
          (?!\k<ms>)\]
        )
        (?<-mr>)(?<-mc>)(?<-ms>)
    )
)+(?(open)(?!))

Since we can't read the top stack, we would have to emulate it by the 3 capturing groups mr, mc and ms. The number of items in mr, mc, ms and open are always the same. When the stack open is not empty, only one out of the 3 capturing groups contains the corresponding opening bracket, the other 2 capture an empty string. The non-empty string capturing group is always the type of the bracket at the top of the stack.

This allows us to match the corresponding closing bracket by asserting that the corresponding captured group can't be matched, e.g. (?!\k<mc>)\}. We know that the group can't be empty (since it has to pass the check on (?<-open>) first). There are only 2 cases left:

  • If the stack top is an empty string, then the backreference will always match, and the assertion always fails.
  • If the stack top contains the corresponding opening bracket, then the backreference will fail, and the assertion succeeds, and we proceed to match the closing bracket.
like image 4
nhahtdh Avatar answered Nov 16 '22 20:11

nhahtdh