Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do I get different backtracking with these Raku regexes?

I'm getting unexpected backtracking of the + quantifier of a Raku regex.

In this regex:

'abc' ~~ m/(\w+) {say $0}  <?{ $0.substr(*-1) eq 'b' }>/;

say $0;

I get the expected result:

「abc」  # inner say
「ab」   # inner say

「ab」   # final say

That is, the (greedy) + quantifier gets all letters and then the conditional fails. After that it starts backtracking by freeing the last got letter, until the conditional evaluates to true.

However, the backtracking does not seem to work the same way when I put the quantifier outside the capturing group:

'abc' ~~ m/[(\w)]+ {say $0}  <?{ $0.tail eq 'b' }>/;

say $0;

Result:

[「a」 「b」 「c」]  # inner say
[「a」 「b」 「c」]  # why this extra inner say? Shouldn't this backtrack to [「a」 「b」]?
[「a」 「b」 「c」]  # why this extra inner say? Shouldn't this backtrack to [「a」 「b」]?
[「b」 「c」]      # Since we could not successfully backtrack, We go on matching by increasing the position
[「b」 「c」]      # Previous conditional fails. We get this extra inner say
[「c」]          # Since we could not successfully backtrack, We go on matching by increasing the position

Nil            # final say, no match because we could not find a final 'b'

Is this behaviour expected? If so: Why do they work differently? Is it possible to mimic the first regex but still keep the quantifier outside the capturing group?

NOTE:

Using a lazy quantifier 'solves' the issue... This is expected because the difference seems to happen with backtracking, and that does not happen with the lazy quantifier.

'abc' ~~ m/[(\w)]+? {say $0}  <?{ $0.tail eq 'b' }>/;

[「a」]
[「a」 「b」]

[「a」 「b」]

However for performance reasons I'd rather use a greedy quantifier (the example in this question is a simplification).

like image 457
Julio Avatar asked Dec 09 '20 23:12

Julio


1 Answers

I don't think the issue is with backtraking. But looks like the intermediate $0 exposed has retained the previous iteration captures. Consider this expression,

'abc' ~~ m/[(\w)]+ {say "Match:",$/.Str,";\tCapture:",$0}  <?{ False }>/;

This is the output:

Match:abc;  Capture:[「a」 「b」 「c」]
Match:ab;   Capture:[「a」 「b」 「c」]
Match:a;    Capture:[「a」 「b」 「c」]
Match:bc;   Capture:[「b」 「c」]
Match:b;    Capture:[「b」 「c」]
Match:c;    Capture:[「c」]

As you can see the match is in proper order, abc ab a .... But the captured array for ab match is also [「a」 「b」 「c」]. I suspect this to be a bug.


For your case, there are a couple of approaches.

  1. Just use $/ for the condition check
    'abc' ~~ m/[(\w)]+  <?{ $/.Str.substr(*-1) eq 'b' }>/;
    
  2. Or, additionally also capture the group with quatifiers as well.
    'abc' ~~ m/([(\w)]+) <?{ $0[0][*-1] eq 'b' }>/;
    
    Here $0 matches the outer group, $0[0] matches the first inner group, $[0][*-1] matches the character that was finally matched in this iteration.

like image 109
Prasanna Avatar answered Oct 12 '22 01:10

Prasanna