Let's say I have this text input.
tes{}tR{R{abc}aD{mnoR{xyz}}}
I want to extract the ff output:
R{abc}
R{xyz}
D{mnoR{xyz}}
R{R{abc}aD{mnoR{xyz}}}
Currently, I can only extract what's inside the {}groups using balanced group approach as found in msdn. Here's the pattern:
^[^{}]*(((?'Open'{)[^{}]*)+((?'Target-Open'})[^{}]*)+)*(?(Open)(?!))$
Does anyone know how to include the R{} and D{} in the output?
I think that a different approach is required here. Once you match the first larger group R{R{abc}aD{mnoR{xyz}}}
(see my comment about the possible typo), you won't be able to get the subgroups inside as the regex doesn't allow you to capture the individual R{ ... }
groups.
So, there had to be some way to capture and not consume and the obvious way to do that was to use a positive lookahead. From there, you can put the expression you used, albeit with some changes to adapt to the new change in focus, and I came up with:
(?=([A-Z](?:(?:(?'O'{)[^{}]*)+(?:(?'-O'})[^{}]*?)+)+(?(O)(?!))))
[I also renamed the 'Open' to 'O' and removed the named capture for the close brace to make it shorter and avoid noises in the matches]
On regexhero.net (the only free .NET regex tester I know so far), I got the following capture groups:
1: R{R{abc}aD{mnoR{xyz}}}
1: R{abc}
1: D{mnoR{xyz}}
1: R{xyz}
Breakdown of regex:
(?= # Opening positive lookahead
([A-Z] # Opening capture group and any uppercase letter (to match R & D)
(?: # First non-capture group opening
(?: # Second non-capture group opening
(?'O'{) # Get the named opening brace
[^{}]* # Any non-brace
)+ # Close of second non-capture group and repeat over as many times as necessary
(?: # Third non-capture group opening
(?'-O'}) # Removal of named opening brace when encountered
[^{}]*? # Any other non-brace characters in case there are more nested braces
)+ # Close of third non-capture group and repeat over as many times as necessary
)+ # Close of first non-capture group and repeat as many times as necessary for multiple side by side nested braces
(?(O)(?!)) # Condition to prevent unbalanced braces
) # Close capture group
) # Close positive lookahead
I actually wanted to try out how it should be working out on the PCRE engine, since there was the option to have recursive regex and I think it was easier since I'm more familiar with it and which yielded a shorter regex :)
(?=([A-Z]{(?:[^{}]|(?1))+}))
regex101 demo
(?= # Opening positive lookahead
([A-Z] # Opening capture group and any uppercase letter (to match R & D)
{ # Opening brace
(?: # Opening non-capture group
[^{}] # Matches non braces
| # OR
(?1) # Recurse first capture group
)+ # Close non-capture group and repeat as many times as necessary
} # Closing brace
) # Close of capture group
) # Close of positive lookahead
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With