Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursive regex doesn’t work

Tags:

regex

php

The string I work on looks like that:

abc {def ghi {jkl mno} pqr stv} xy z

And I need to put what figure parentheses are containing in tags, so it should looks like this

abc <tag>def ghi <tag>jkl mno</tag> pqr stv</tag> xy z

I’ve tried

'#(?<!\pL)\{  ( ([^{}]+) | (?R) )*  \}(?!\pL)#xu'

but what I get is just <tag>xy z</tag>. Help please, what am I doing wrong?

like image 819
tijagi Avatar asked Feb 19 '23 21:02

tijagi


2 Answers

Nested structures are by definition too complicated for regular expressions (yes, PCRE supports recursion, but that does not help for this replacement-problem). There are two possible options for you (using regular expressions anyway). Firstly, you could simply replace opening brackets by opening tags and the same for closing tags. This, however, will convert unmatched brackets as well:

$str = preg_replace('/\{/', '<tag>', $str);
$str = preg_replace('/\}/', '</tag>', $str);

Another option is to only replace matching { and }, but then you have to do it repeatedly, because one call to preg_replace cannot replace multiple nested levels:

do
{
    $str = preg_replace('/\{([^{]*?)\}/', '<tag>$1</tag>', $str, -1, $count);
}
while ($count > 0)

EDIT: While PCRE supports recursion with (?R) this will most likely not help with a replacement. The reason is that, if a capturing group is repeated, its reference will only contain the last capturing (i.e. when matching /(a|b)+/ in aaaab, $1 will contain b). I suppose that this is the same for recursion. That is why you can only replace the innermost match because it's the last match of the capturing group within the recursion. Likewise, you could not try to capture { and } with recursion and replace these, because they might also be matched an arbitrary number of times and only the last match will be replaced.

Just matching a correct nested syntax and then replacing the innermost or outermost matching brackets will not help either (with one preg_replace call), because multiple matches will never overlap (so if 3 nested brackets have been found, the inner 2 brackets themselves will be disregarded for further matches).

like image 80
Martin Ender Avatar answered Feb 21 '23 09:02

Martin Ender


How about two steps:

s!{!<tag>!g;
s!}!</tag>!g;

(perl format; translate to your format as appropriate)

or maybe this:

1 while s!{([^{}]*)}!<tag>$1</tag>!g;

like image 26
Brian White Avatar answered Feb 21 '23 10:02

Brian White