I'm trying to make a php regex work that parses a string for text in brackets while ignoring possible nested brackets:
Let's say I want
Lorem ipsum [1. dolor sit amet, [consectetuer adipiscing] elit.]. Aenean commodo ligula eget dolor.[2. Dolor, [consectetuer adipiscing] elit.] Aenean massa[3. Lorem ipsum] dolor.
to return
[1] => "dolor sit amet, [consectetuer adipiscing] elit."
[2] => "Dolor, [consectetuer adipiscing] elit."
[3] => "Lorem ipsum"
So far i got
'/\[([0-9]+)\.\s([^\]]+)\]/gi'
but it breaks when nested brackets occur. See demo
How can i ignore the inner brackets from detection? Thx in advance!
You can use recursive references to previous groups:
(?<no_brackets>[^\[\]]*){0}(?<balanced_brackets>\[\g<no_brackets>\]|\[(?:\g<no_brackets>\g<balanced_brackets>\g<no_brackets>)*\])
See it in action
The idea is to define your desired matches as either something with no brackets, surrounded by []
or something, which contains a sequence of no brackets or balanced brackets with the first rule.
You can use this pattern that captures the item number and the following text in two different groups. If you are sure all item numbers are unique, you can build the associative array described in your question with a simple array_combine
:
$pattern = '~\[ (?:(\d+)\.\s)? ( [^][]*+ (?:(?R) [^][]*)*+ ) ]~x';
if (preg_match_all($pattern, $text, $matches))
$result = array_combine($matches[1], $matches[2]);
Pattern details:
~ # pattern delimiter
\[ # literal opening square bracket
(?:(\d+)\.\s)? # optional item number (*)
( # capture group 2
[^][]*+ # all that is not a square bracket (possessive quantifier)
(?: #
(?R) # recursion: (?R) is an alias for the whole pattern
[^][]* # all that is not a square bracket
)*+ # repeat zero or more times (possessive quantifier)
)
] # literal closing square bracket
~x # free spacing mode
(*) note that the item number part must be optional if you want to be able to use the recursion with (?R)
(for example [consectetuer adipiscing]
doesn't have an item number.). This can be problematic if you want to avoid square brackets without item number. In this case you can build a more robust pattern if you change the optional group (?:(\d+)\.\s)?
to a conditional statement: (?(R)|(\d+)\.\s)
Conditional statement:
(?(R) # IF you are in a recursion
# THEN match this (nothing in our case)
| # ELSE
(\d+)\.\s #
)
In this way the item number becomes mandatory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With