Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex for checking if a string has mismatched parentheses?

Tags:

regex

php

In a PHP script, what regex should I use to check for mismatched parentheses in a string? Things that I want to allow include:

  • This is (ok)
  • This (is) (ok)

Things I want to prevent:

  • This is )bad(
  • This is also (bad
  • This is (bad (too)

Thanks!

Update: You guys all rock. Doing this with a regex seemed trickier than it should have, and these kinds of 2nd level answers are what makes stackoverflow beautiful. Thanks for the links and the pseudocode. I'm not sure who to give the answer to, so I apologize to everyone whose answers I can't accept.

like image 927
twk Avatar asked Feb 18 '09 20:02

twk


2 Answers

Regex is not the right tool for the job. Scan a string manually.

Pseudo-code:

depth = 0
for character in some_string:
    depth += character == '('
    depth -= character == ')'
    if depth < 0:
       break

if depth != 0:
   print "unmatched parentheses"
like image 187
jfs Avatar answered Oct 07 '22 01:10

jfs


You can do this with a regular expression -- PCRE, as used by PHP, allows recursive patterns. The PHP Manual gives an example that is almost exactly what you want:

\(((?>[^()]+)|(?R))*\)

This matches any correctly parenthesised substring as long as it begins and ends with parentheses. If you want to ensure the entire string is balanced, allowing strings like "wiggedy(wiggedy)(wiggedy(wack))", here's what I came up with:

^((?:[^()]|\((?1)\))*+)$

Here's an explanation of the pattern, which may be more illuminating than obfuscatory:

^             Beginning of the string
(             Start the "balanced substring" group (to be called recursively)
  (?:         Start the "minimal balanced substring" group
    [^()]     Minimal balanced substring is either a non-paren character
    |         or
    \((?1)\)  a set of parens containing a balanced substring
  )           Finish the "minimal balanced substring" group
  *           Our balanced substring is a maximal sequence of minimal
              balanced substrings
  +           Don't backtrack once we've matched a maximal sequence
)             Finish the "balanced substring" pattern
$             End of the string

There are lots of considerations of efficiency and correctness that come up with these sorts of regexes. Be careful.

like image 32
Bennett McElwee Avatar answered Oct 06 '22 23:10

Bennett McElwee