I have found the following resources on Balanced Matching for .net Regexes:
From what I have read in these, the following example should work:
This regex should find an "a" anywhere within an angle-bracket group, no matter how deep. It should match "<a>
", "<<a>>
", "<a<>>
", "<<>a>
", "<<><a>>
", etc.
(?<=
^
(
(
<(?<Depth>)
|
>(?<-Depth>)
)
[^<>]*?
)+?
)
(?(Depth)a|(?!))
matching on the "a" in the string "<<>a>"
While it will work for strings "<a<>>
" and "<<a>>
", I can't get it to match an "a" that is following a ">".
According to the explanations I have read, the first two "<"s should increment Depth twice, then the first ">" should decrement it once. At this point, (?(Depth)a|(?!)) should perform the "yes" option, but the regex never even makes it here.
Consider the following regex, which makes no such check and still fails to match the string in question:
(?<=
^
(
(
<(?<Depth>)
|
>(?<-Depth>)
)
[^<>]*?
)+?
)
a
Am I missing something, or is the regex engine working incorrectly?
You need to remember that a lookbehind only scans as far back as it has to in order to satisfy its embedded regex. The expression in your lookbehind is only required to match an angle bracket, so it only looks as far back as the latest one. If it's a left angle bracket, (?<Depth>)
pushes an empty string onto the stack represented by that capture group. But if it's a right angle bracket...
It is worth mentioning that if no named group N exists when trying to pop (<-N>) then it will fail... *
In other words it's not the conditional expression -- (?(Depth)a|(?!))
-- that's making your regex fail (as you observed), it's the attempt to "decrement" the "counter". As far as I can tell, your regex is exactly equivalent to
(?<=<[^<>]*)a
So, to answer your question, .NET's balanced-construct matching is not broken. Byzantine yes, but broken, no. :D
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With