Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to match balanced parentheses

Tags:

regex

I need a regular expression to select all the text between two outer brackets.

Example: some text(text here(possible text)text(possible text(more text)))end text

Result: (text here(possible text)text(possible text(more text)))

like image 711
DaveF Avatar asked Feb 13 '09 15:02

DaveF


People also ask

How do you match a literal parenthesis in a regular expression?

The way we solve this problem—i.e., the way we match a literal open parenthesis '(' or close parenthesis ')' using a regular expression—is to put backslash-open parenthesis '\(' or backslash-close parenthesis '\)' in the RE. This is another example of an escape sequence.

Are balanced parentheses regular?

balanced parentheses is not a regular language.

What's the difference between () and [] in regular expression?

[] denotes a character class. () denotes a capturing group. (a-z0-9) -- Explicit capture of a-z0-9 . No ranges.


1 Answers

I want to add this answer for quickreference. Feel free to update.


.NET Regex using balancing groups.

\((?>\((?<c>)|[^()]+|\)(?<-c>))*(?(c)(?!))\) 

Where c is used as the depth counter.

Demo at Regexstorm.com

  • Stack Overflow: Using RegEx to balance match parenthesis
  • Wes' Puzzling Blog: Matching Balanced Constructs with .NET Regular Expressions
  • Greg Reinacker's Weblog: Nested Constructs in Regular Expressions

PCRE using a recursive pattern.

\((?:[^)(]+|(?R))*+\) 

Demo at regex101; Or without alternation:

\((?:[^)(]*(?R)?)*+\) 

Demo at regex101; Or unrolled for performance:

\([^)(]*+(?:(?R)[^)(]*)*+\) 

Demo at regex101; The pattern is pasted at (?R) which represents (?0).

Perl, PHP, Notepad++, R: perl=TRUE, Python: Regex package with (?V1) for Perl behaviour.


Ruby using subexpression calls.

With Ruby 2.0 \g<0> can be used to call full pattern.

\((?>[^)(]+|\g<0>)*\) 

Demo at Rubular; Ruby 1.9 only supports capturing group recursion:

(\((?>[^)(]+|\g<1>)*\)) 

Demo at Rubular  (atomic grouping since Ruby 1.9.3)


JavaScript  API :: XRegExp.matchRecursive

XRegExp.matchRecursive(str, '\\(', '\\)', 'g'); 

JS, Java and other regex flavors without recursion up to 2 levels of nesting:

\((?:[^)(]+|\((?:[^)(]+|\([^)(]*\))*\))*\) 

Demo at regex101. Deeper nesting needs to be added to pattern.
To fail faster on unbalanced parenthesis drop the + quantifier.


Java: An interesting idea using forward references by @jaytea.


Reference - What does this regex mean?

  • rexegg.com - Recursive Regular Expressions
  • Regular-Expressions.info - Regular Expression Recursion
like image 82
bobble bubble Avatar answered Sep 29 '22 23:09

bobble bubble