Are regex atomic groups distributive?
I.e. is (?>A?B?)
always equivalent to (?>A?)(?>B?)
?
If not please provide a counter example.
Atomic groups in general
The atomic group (?>regex1|regex2|regex3)
takes only the first successful match within it. In other words, it doesn't allow backtracking.
Regexes are evaluated left-to-right, so you express the order you intend things to match. The engine starts at the first position, trying to make a successful match, backtracking if necessary. If any path through the expression would lead to a successful match, then it will match at that position.
Atomic groups are not distributive. Consider these patterns evaluated over ABC
:
(?>(AB?))(?>(BC))
(no match) and (?>(AB?)(BC))
(matches ABC
).
Atomic Groups with all optional components
But, your scenario where both parts are optional may be different.
Considering an atomic group with 2 greedy optional parts A and B ((A)?
and (B)?
). At any position, if A
matches, it can move on to evaluate the optional B
. Otherwise, if A
doesn't match, that's fine, too because it's optional. Therefore, (A)?
matches at any position. The same logic applies for the optional B
. The question remaining is whether there can be any difference in backtracking.
In the case of all optional parts ((?>A?B?)
), since each part always matches, there's no reason to backtrack within the atomic group, so it will always match. Then, since it is in an atomic group, it is prohibited from backtracking.
In the case of separate atomic groups ((?>A?)(?>B?)
), each part always matches, and the engine is prohibited from backtracking in either case. This means the results will be the same.
To reiterate, the engine can only use the first possible match in (?>A?)(?>B?)
, which will always be the same match as the first possible match in (?>A?B?)
. Thus, if my reasoning is correct,for this special case, the matches will be the same for multiple optional atomic groups as a single atomic group with both optional components.
Since you didn't specify, I'll assume you're referring to Perl regexes, since I haven't seen the (?>)
grouping operator in any other language.
Consider the following:
ra = 'A?'
rb = 'B?'
/(?>${ra} ${rb})/x
is the same as/(?>${ra})(?>${rb})/x
.
In this case, yes, it works either way; however, because (?>)
disables backtracking, this is not the case with some other values of ra
and rb
.
For example, given:
ra = 'A*'
rb = 'AB*'
/(?>${ra} ${rb})/x
!= /(?>${ra})(?>${rb})/x
.
In the latter, rb could never match, since ra would consume an entire sequence of A's, and would not allow backtracking. Note that this would work if we used (?:)
as the grouping operator. Note also, that if we used capture groups ()
, then the match would be the same, but the side effects (assignment to \1
, \2
, ...) would be different.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With