I'm getting unexpected backtracking of the +
quantifier of a Raku regex.
In this regex:
'abc' ~~ m/(\w+) {say $0} <?{ $0.substr(*-1) eq 'b' }>/;
say $0;
I get the expected result:
「abc」 # inner say
「ab」 # inner say
「ab」 # final say
That is, the (greedy) +
quantifier gets all letters and then the conditional fails. After that it starts backtracking by freeing the last got letter, until the conditional evaluates to true.
However, the backtracking does not seem to work the same way when I put the quantifier outside the capturing group:
'abc' ~~ m/[(\w)]+ {say $0} <?{ $0.tail eq 'b' }>/;
say $0;
Result:
[「a」 「b」 「c」] # inner say
[「a」 「b」 「c」] # why this extra inner say? Shouldn't this backtrack to [「a」 「b」]?
[「a」 「b」 「c」] # why this extra inner say? Shouldn't this backtrack to [「a」 「b」]?
[「b」 「c」] # Since we could not successfully backtrack, We go on matching by increasing the position
[「b」 「c」] # Previous conditional fails. We get this extra inner say
[「c」] # Since we could not successfully backtrack, We go on matching by increasing the position
Nil # final say, no match because we could not find a final 'b'
Is this behaviour expected? If so: Why do they work differently? Is it possible to mimic the first regex but still keep the quantifier outside the capturing group?
NOTE:
Using a lazy quantifier 'solves' the issue... This is expected because the difference seems to happen with backtracking, and that does not happen with the lazy quantifier.
'abc' ~~ m/[(\w)]+? {say $0} <?{ $0.tail eq 'b' }>/;
[「a」]
[「a」 「b」]
[「a」 「b」]
However for performance reasons I'd rather use a greedy quantifier (the example in this question is a simplification).
I don't think the issue is with backtraking. But looks like the intermediate $0
exposed has retained the previous iteration captures. Consider this expression,
'abc' ~~ m/[(\w)]+ {say "Match:",$/.Str,";\tCapture:",$0} <?{ False }>/;
This is the output:
Match:abc; Capture:[「a」 「b」 「c」]
Match:ab; Capture:[「a」 「b」 「c」]
Match:a; Capture:[「a」 「b」 「c」]
Match:bc; Capture:[「b」 「c」]
Match:b; Capture:[「b」 「c」]
Match:c; Capture:[「c」]
As you can see the match is in proper order, abc ab a ...
. But the captured array for ab
match is also [「a」 「b」 「c」]
. I suspect this to be a bug.
For your case, there are a couple of approaches.
$/
for the condition check
'abc' ~~ m/[(\w)]+ <?{ $/.Str.substr(*-1) eq 'b' }>/;
'abc' ~~ m/([(\w)]+) <?{ $0[0][*-1] eq 'b' }>/;
Here $0
matches the outer group, $0[0]
matches the first inner group, $[0][*-1]
matches the character that was finally matched in this iteration.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With