Consider the following perl script:
#!/usr/bin/perl
my $str = 'not-found=1,total-found=63,ignored=2';
print "1. matched using regex\n" if ($str =~ m/total-found=(\d+)/g);
print "2. matched using regex\n" if ($str =~ m/total-found=(\d+)/g);
print "3. matched using regex\n" if ($str =~ m/total-found=(\d+)/g);
print "4. matched using regex\n" if ($str =~ m/total-found=(\d+)/g);
print "Bye!\n";
The output after running this is:
1. matched using regex
3. matched using regex
Bye!
The same regex matches once and does not match immediately after. Any idea why the alternate attempts to match the same string with the same regex fail in perl?
Thanks!
Here is the long explanation why your code doesn't work.
The /g
modifier changes the behaviour of the regex to “global matching”. This will match all occurrences of the pattern in the string. However, how this matching is done depends on context. The two (main) contexts in Perl are list context (the plural) and scalar context (the singular).
In list context, a global regex match returns a list of all matched substrings, or a flat list of all matched captures:
my $_ = "foobaa";
my $regex = qr/[aeiou]/;
my @matches = /$regex/g; # match all vowels
say "@matches"; # "o o a a"
In scalar context, the match seems to return a perl boolean decribing whether the regex matched:
my $match = /$regex/g;
say $match; # "1" (on failure: the empty string)
However, the regex turned into an iterator. Each time the regex match is executed, the regex starts at the current position in the string, and tries to match. If it matches, it returns true. If the match fails, then
Because the position in the string was reset, the next match will suceed again.
my $match;
say $match while $match = /$regex/g;
say "The match returned false, or the while loop would have go on forever";
say "But we can match again" if /$regex/g;
The second effect — resetting the position — can be cancelled with the additional /c
flag.
The position in a string can be accessed with the pos
function: pos($string)
returns the current position, which can be set like pos($string) = 0
.
The regex can also be anchored with the \G
assertion at the current position, much like ^
anchores a regex at the start of the string.
This m//gc
-style matching makes it easy to write a tokenizer:
my @tokens;
my $_ = "1, abc, 2 ";
TOKEN: while(pos($_) < length($_)) {
/\G\s+/gc and next; # skip whitespace
# if one of the following matches fails, the next token is tried
if (/\G(\d+)/gc) { push @tokens, [NUM => $1]}
elsif (/\G,/gc ) { push @tokens, ['COMMA' ]}
elsif (/\G(\w+)/gc) { push @tokens, [STR => $1]}
else { last TOKEN } # break the loop only if nothing matched at this position.
}
say "[@$_]" for @tokens;
Output:
[NUM 1]
[COMMA]
[STR abc]
[COMMA]
[NUM 2]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With