I run a simple program:
my $_ = '/login/.htaccess/.htdf';
s!(/\.ht.*?)$!/!;
print "$_ $1";
OUT/login/ /.htaccess/.htdf
I want this regex to match only /.htdf
.
Example 2:
my $_ = 'abcbc';
m/(b.*?)$/;
print "$_ $1\n";
OUTabcbc bcbc
I expect bc
.
Why is *?
still greedy? (I want the minimal match.)
In general, the regex engine will try to match as many input characters as possible once it encounters a quantified token like \d+ or, in our case, . * . That behavior is called greedy matching because the engine will eagerly attempt to match anything it can.
You make it non-greedy by using ". *?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ". *?" . This means that if for instance nothing comes after the ".
To use non-greedy Perl-style regular expressions, the ? (question mark) may be added to the syntax, usually where the wildcard expression is used. In our above example, our wildcard character is the . * (period and asterisk). The period will match any character except a null (hex 00) or new line.
The standard quantifiers in regular expressions are greedy, meaning they match as much as they can, only giving back as necessary to match the remainder of the regex. By using a lazy quantifier, the expression tries the minimal match first.
Atoms are matched in sequence, and each atom after the first must match at the position where the previous atom left off matching. (The first atom is implicitly preceded by \A(?s:.)*?
.) That means that .*
/.*?
doesn't get to decided where it starts matching; it only gets to decided where it stops matching.
It's not being greedy. \.ht
brings the match to position 10, and at position 10, the minimum .*?
can match and still have the rest of the pattern match is access/.htdf
. In fact, it's the only thing .*?
can match at position 10 and still have the rest of the pattern match.
I think you want to remove that last part of the path if it starts with .ht
, leaving the preceding /
in place. For that, you can use either of the following:
s{/\.ht[^/]*$}{/}
or
s{/\K\.ht[^/]*$}{}
It's not being greedy. b
brings the match to position 2, and at position 2, the minimum .*?
can match and still have the rest of the pattern match is cbc
. In fact, it's the only thing .*?
can match at position 2 and still have the rest of the pattern match.
You are probably looking for
/b[^b]*$/
or
/b(?:(?!b).)*$/ # You'd use this if "b" was really more than one char.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With