Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursive regex with text before nested parenthesis

I have the following text

$text = 'This is a test to see if something(try_(this(once))) works';

I need to get something(try_(this(once))) with regex from the text. I have the following issue

  • My nesting will not remain constant, my text can be

    • something(try_(this(once))) or
    • something(try_this(once)) or
    • something(try_thisonce)

I have tried a number of regex found across the site, but cannot get it working. Here is the closest I have come

EXAMPLE 1:

$text = 'This is a test to see if something(try_(this(once))) works';
$output = preg_match_all('/(\(([^()]|(?R))*\))/', $text, $out);
?><pre><?php var_dump($out[0]); ?></pre><?php   

This outputs

array(1) {
  [0]=>
  string(18) "(try_(this(once)))"
}

No matter where I add the word something(for example '/something(\(([^()]|(?R))*\))/' and '/(\something(([^()]|(?R))*\))/'), I get an empty array or NULL

EXAMPLE 2

$text2 = 'This is a test to see if something(try_(this(once))) works';
$output2 = preg_match_all('/something\((.*?)\)/', $text2, $out2);
?><pre><?php var_dump($out2[0]); ?></pre><?php  

With this code I do get the word something back,

array(1) {
  [0]=>
  string(25) "something(try_(this(once)"
}

but then the expression stops and return after the first closing ) which is expected as this is not a recursive expression

How do I recursively match and return a nested parenthesis with the word something before the first opening (, and if possible, what happens then there might or might not be a whitespace before the word something, for example

  • something(try_(this(once))) or
  • something (try_(this(once)))
like image 718
Pieter Goosen Avatar asked Oct 05 '15 17:10

Pieter Goosen


2 Answers

(?R) isn't a magical incantation to obtain a pattern able to handle balanced things (like parenthesis for example). (?R) is the same thing than (?0), it is an alias for "the capture group zero", in other words, the whole pattern.

In the same way you can use (?1), (?2), etc. as aliases for the sub-patterns in group 1, 2, etc.

As an aside, note that except for (?0) and (?R) that are obviously always in their sub-pattern, since it is the whole pattern, (?1), (?2) induce a recursion only if they are in their respective own groups, and can be used only to not rewrite a part of a pattern.

something\((?:[^()]|(?R))*\) doesn't work because it imposes each nested (or not) opening parenthesis to be preceded by something in your string.

Conclusion, you can't use (?R) here, and you need to create a capture group to only handle nested parenthesis:

(\((?:[^()]|(?1))*\))

that can be written in a more efficient way:

(\([^()]*(?:(?1)[^()]*)*+\))

To finish you only need to add something that is no more included in the recursion:

something(\([^()]*(?:(?1)[^()]*)*+\))

Note that if something is a sub-pattern with an undetermined number of capture groups, it is more handy to refer to the last opened capture group with a relative reference like this:

som(eth)ing(\([^()]*(?:(?-1)[^()]*)*+\))
like image 139
Casimir et Hippolyte Avatar answered Nov 14 '22 22:11

Casimir et Hippolyte


[^() ]*(\((?:[^()]|(?1))*\))

You need to use ?1.(?1) recurses the 1st subpattern.See demo.

https://regex101.com/r/cJ6zQ3/4

like image 37
vks Avatar answered Nov 14 '22 21:11

vks