After having read this similar question and having tried my code several times, I keep on getting the same undesired output.
Let's assume the string I'm searching is "I saw wilma yesterday". The regex should capture each word followed by an 'a' and its optional 5 following characters or spaces.
The code I wrote is the following:
$_ = "I saw wilma yesterday";
if (@m = /(\w+)a(.{5,})?/g){
print "found " . @m . " matches\n";
foreach(@m){
print "\t\"$_\"\n";
}
}
However, I kept on getting the following output:
found 2 matches
"s"
"w wilma yesterday"
while I expected to get the following one:
found 3 matches:
"saw wil"
"wilma yest"
"yesterday"
until I found out that the return values inside @m
were $1
and $2
, as you can notice.
Now, since the /g
flag is on, and I don't think the problem is about the regex, how could I get the desired output?
You can try this pattern that allows overlapped results:
(?=\b(\w+a.{1,5}))
or
(?=(?i)\b([a-z]+a.{0,5}))
example:
use strict;
my $str = "I saw wilma yesterday";
my @matches = ($str =~ /(?=\b([a-z]+a.{0,5}))/gi);
print join("\n", @matches),"\n";
more explanations:
You can't have overlapped results with a regex since when a character is "eaten" by the regex engine it can't be eaten a second time. The trick to avoid this constraint, is to use a lookahead (that is a tool that only checks, but not matches) which can run through the string several times, and put a capturing group inside.
For another example of this behaviour, you can try the example code without the word boundary (\b
) to see the result.
Firstly you want to capture everything inside the expression, i.e.:
/(\w+a(?:.{5,})?)/
Next you want to start your search from one character past where the last expression's first character matched.
The pos()
function allows you to specify where a /g
regex starts its search from.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With