I'm confused about the greedy operator on positive and negative look ahead.
script for positive lookahead
foreach (<DATA>){
$_ = m/AAA.+(?=BBB)/g;
print "$&\n";
}
__DATA__
AAA 1121 BBB
AAA 443 CCC
AAA 4431 BBB
ABC 321 EACA
AAA 321 BBB
ACD 431 MAKN
AAA 751 ABC
It's output
AAA 1121
AAA 4431
AAA 321
Negative Lookahead
foreach (<DATA>){
$_ = m/AAA.+(?!BBB)/g;
print "$&\n";
}
It's output
AAA 1121 BBB
AAA 443 CCC
AAA 4431 BBB
AAA 321 BBB
AAA 751 ABC
In the execution of the negative lookahed do not consider the (?!BBB). Because I'm using the greedy operator preceding the (?!BBB). In that case, postive look ahead greed operator considers the (?=BBB). Which is why gives the different result?
I can easily achieve the OP by the code $_ = m/AAA\s\d+(?!.+BBB)/g;.
But i don't know what is the execution of my code?
Let's consider the first example:
AAA 1121 BBB
\_/\_______/^
| | |
| | +--- this (the empty string right there) satisfies (?!BBB)
| |
| +-------- matched by .+
|
+-------------- matched by AAA
This is because the greedy .+ consumes 1121 BBB including the BBB. After it has consumed the rest of the line, (?!BBB) is checked against the remaining empty string. And this empty string satisfies (?!BBB) since it's not followed by BBB?
The algorithm executes like the following. ^ is the current position (there's a current position in the string and a current position in the pattern (kind of)).
Initial state:
AAA 1121 BBB AAA.+(?!BBB)
^ ^
Match AAA
AAA 1121 BBB AAA.+(?!BBB)
^ ^
Match .+
AAA 1121 BBB AAA.+(?!BBB)
^ ^
Check (?!BBB)
AAA 1121 BBB AAA.+(?!BBB)
^ ^
No BBB match at this position => Success!
AAA 1121 BBB
\__________/
Now, let's see why exactly AAA.+(?=BBB) yields a match:
Initial state:
AAA 1121 BBB AAA.+(?=BBB)
^ ^
Match AAA
AAA 1121 BBB AAA.+(?=BBB)
^ ^
Match .+
AAA 1121 BBB AAA.+(?=BBB)
^ ^
Check (?=BBB)
AAA 1121 BBB AAA.+(?=BBB)
^ ^
No BBB match at this position => Backtrack (consume one less char by .+)
Check (?=BBB)
AAA 1121 BBB AAA.+(?=BBB)
^ ^
No BBB match at this position => Backtrack (consume one less char by .+)
Check (?=BBB)
AAA 1121 BBB AAA.+(?=BBB)
^ ^
No BBB match at this position => Backtrack (consume one less char by .+)
Check (?=BBB)
AAA 1121 BBB AAA.+(?=BBB)
^ ^
We do have a BBB match at this position => Success!
AAA 1121 BBB
\_______/
There is no difference in how it works in your two cases, and .+ is greedy in both of your cases.
When matching AAA.+(?=BBB) against AAA 1121 BBB, the most .+ can match starting after AAA is <spc>1121<spc>. Anything longer will cause (?=BBB) to fail.
When matching AAA.+(?!BBB) against AAA 1121 BBB, the most .+ can match starting after AAA is <spc>1121<spc>BBB. Being the rest of the string, it can't possibly match anything longer.
Note that the end of the string isn't followed by BBB, so (?!BBB) matches at the end of the string.
(?:(?!STRING).)* is to STRING as [^CHAR]* is to CHAR.
I'd go with
say $1 if /^(AAA\s+\S+)\s+(?:(?!BBB)\s)*\z/;
On second thought, I'd go with
my @F = split;
say "$F[0] $F[1]" if $F[0] eq 'AAA' && $F[2] ne 'BBB';
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With