Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

* quantifier in Perl 6

This seems to be something very basic that I don't understand here.

Why doesn't "babc" match / a * / ?

> "abc" ~~ / a /
「a」
> "abc" ~~ / a * /
「a」
> "babc" ~~ / a * /
「」                    # WHY?
> "babc" ~~ / a + /
「a」
like image 306
Eugene Barsky Avatar asked Dec 07 '18 21:12

Eugene Barsky


People also ask

What does * quantifier represent in regex?

The * quantifier matches the preceding element zero or more times.

What are quantifiers in Perl?

Perl provides several numbers of regular expression quantifiers which are used to specify how many times a given character can be repeated before matching is done. This is mainly used when the number of characters going to be matched is unknown.

What is $1 Perl?

$1 equals the text " brown ".

How do I match a pattern in Perl?

m operator in Perl is used to match a pattern within the given text. The string passed to m operator can be enclosed within any character which will be used as a delimiter to regular expressions.


2 Answers

Because * quantifier makes the preceding atom match zero or more times.

「」 is first match of / a * / in any string. For example:

say "xabc" ~~ / a * . /; # OUTPUT: 「x」

it's same:

say "xabc" ~~ / (a+)? . /;

If you set the pattern more precise, you will get another result:

say "xabc" ~~ / x a * /; # OUTPUT: 「xa」
say "xabc" ~~ / a * b /; # OUTPUT: 「ab」
like image 73
Pavlo Bashynskyi Avatar answered Nov 23 '22 22:11

Pavlo Bashynskyi


The answers here are correct, I'll just try to present them in a more coherent form:

Matching always starts from the left

The regex engine always starts at the left of the strings, and prefers left-most matches over longer matches

* matches empty strings

The regex a* matches can match the strings '', 'a', 'aa' etc. It will always prefer the longest match it finds, but it can't find a match longer than the empty string, it'll just match the empty string.

Putting it together

In 'abc' ~~ /a*/, the regex engine starts at position 0, the a* matches as many a's as it can, and thus matches the first character.

In 'babc' ~~ /a*/, the regex engine starts at position 0, and the a* can match only zero characters. It does so successfully. Since the overall match succeeds, there is no reason to try again at position 1.

like image 38
moritz Avatar answered Nov 23 '22 22:11

moritz