Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

:ex and :ov adverbs with Perl 6 named captures

Tags:

raku

I don't fully understand, why the results are different here. Does :ov apply only to <left>, so having found the longest match it wouldn't do anything else?

my regex left {
    a  | ab
}
my regex right {
    bc | c
}

"abc" ~~ m:ex/<left><right> 
   {put $<left>, '|', $<right>}/; # 'ab|c' and 'a|bc'
say '---';

"abc" ~~ m:ov/<left><right> 
   {put $<left>, '|', $<right>}/; # only 'ab|c'
like image 818
Eugene Barsky Avatar asked Oct 24 '17 15:10

Eugene Barsky


2 Answers

Types of adverbs

It's important to understand that there are two different types of regex adverbs:

  1. Those that fine-tune how your regex code is compiled (e.g. :sigspace/:s, :ignorecase/:i, ...). These can also be written inside the regex, and only apply to the rest of their lexical scope within the regex.
  2. Those that control how regex matches are found and returned (e.g. :exhaustive/:ex, :overlap/:ov, :global/:g). These apply to a given regex matching operation as a whole, and have to be written outside the regex, as an adverb of the m// operator or .match method.

Match adverbs

Here is what the relevant adverbs of the second type do:

  • m:ex/.../ finds every possible match at every possible starting position.
  • m:ov/.../ finds the first possible match at every possible starting position.
  • m:g/.../ finds the first possible match at every possible starting position that comes after the end of the previous match (i.e., non-overlapping).
  • m/.../ finds the first possible match at the first possible starting position.

(In each case, the regex engine moves on as soon as it has found what it was meant to find at any given position, that's why you don't see additional output even by putting print statements inside the regexes.)

Your example

In your case, there are only two possible matches: ab|c and a|bc.
Both start at the same position in the input string, namely at position 0.
So only m:ex/.../ will find both of them – all the other variants will only find one of them and then move on.

like image 80
smls Avatar answered Jan 04 '23 11:01

smls


:ex will find all possible combinations of overlapping matches.

:ov acts like :ex except that it limits the search algorithm by constraining it to find only a single match for a given starting position, causing it to produce a single match for a given length. :ex is allowed to start from the very beginning of the string to find a new unique match, and so it may find several matches of length 3; :ov will only ever find exactly one match of length 3.

Documentation:
https://docs.perl6.org/language/regexes

Exhaustive:

To find all possible matches of a regex – including overlapping ones – and several ones that start at the same position, use the :exhaustive (short :ex) adverb

Overlapping:

To get several matches, including overlapping matches, but only one (the longest) from each starting position, specify the :overlap (short :ov) adverb:

like image 40
antiduh Avatar answered Jan 04 '23 12:01

antiduh