As the title says, here is an example input:
(outer
(center
(inner)
(inner)
center)
ouer)
(outer
(inner)
ouer)
(outer
ouer)
Of course, the matched strings will be processed by recursion.
I want the first recursion to match:
[
(outer
(center
(inner)
(inner)
center)
ouer),
(outer
(inner)
ouer),
(outer
ouer)]
And the after processes is needless to say...
Many regex implementations will not allow you to match an arbitrary amount of nesting. However, Perl, PHP and .NET support recursive patterns.
A demo in Perl:
#!/usr/bin/perl -w
my $text = '(outer
(center
(inner)
(inner)
center)
ouer)
(outer
(inner)
ouer)
(outer
ouer)';
while($text =~ /(\(([^()]|(?R))*\))/g) {
print("----------\n$1\n");
}
which will print:
----------
(outer
(center
(inner)
(inner)
center)
ouer)
----------
(outer
(inner)
ouer)
----------
(outer
ouer)
Or, the PHP equivalent:
$text = '(outer
(center
(inner)
(inner)
center)
ouer)
(outer
(inner)
ouer)
(outer
ouer)';
preg_match_all('/(\(([^()]|(?R))*\))/', $text, $matches);
print_r($matches);
which produces:
Array
(
[0] => Array
(
[0] => (outer
(center
(inner)
(inner)
center)
ouer)
[1] => (outer
(inner)
ouer)
[2] => (outer
ouer)
)
...
An explanation:
( # start group 1
\( # match a literal '('
( # group 2
[^()] # any char other than '(' and ')'
| # OR
(?R) # recursively match the entir pattern
)* # end group 2 and repeat zero or more times
\) # match a literal ')'
) # end group 1
Note @Goozak's comment:
A better pattern might be
\(((?>[^()]+)|(?R))*\)(from PHP:Recursive patterns). For my data, Bart's pattern was crashing PHP when it encountered a (long string) without nesting. This pattern went through all my data without problem.
Don't use regex.
Instead, a simple recursive function will suffice. Here's the general structure:
def recursive_bracket_parser(s, i):
while i < len(s):
if s[i] == '(':
i = recursive_bracket_parser(s, i+1)
elif s[i] == ')':
return i+1
else:
# process whatever is at s[i]
i += 1
return i
For example, here's a function that will parse the input into a nested list structure:
def parse_to_list(s, i=0):
result = []
while i < len(s):
if s[i] == '(':
i, r = parse_to_list(s, i+1)
result.append(r)
elif s[i] == ')':
return i+1, result
else:
result.append(s[i])
i += 1
return i, result
Calling this like parse_to_list('((a) ((b)) ((c)(d)))efg') produces the result [[['a'], ' ', [['b']], ' ', [['c'], ['d']]], 'e', 'f', 'g'].
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With