Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Greedy operator result is differ in positive and negative lookahead?

Tags:

regex

perl

I'm confused about the greedy operator on positive and negative look ahead.

script for positive lookahead

foreach (<DATA>){
$_ = m/AAA.+(?=BBB)/g;
print "$&\n";
} 
__DATA__
AAA 1121 BBB
AAA 443  CCC
AAA 4431 BBB
ABC 321  EACA
AAA 321  BBB
ACD 431 MAKN
AAA 751  ABC

It's output

AAA 1121 

AAA 4431 

AAA 321  

Negative Lookahead

foreach (<DATA>){
$_ = m/AAA.+(?!BBB)/g; 
print "$&\n";
} 

It's output

AAA 1121 BBB
AAA 443  CCC
AAA 4431 BBB

AAA 321  BBB

AAA 751  ABC

In the execution of the negative lookahed do not consider the (?!BBB). Because I'm using the greedy operator preceding the (?!BBB). In that case, postive look ahead greed operator considers the (?=BBB). Which is why gives the different result?

I can easily achieve the OP by the code $_ = m/AAA\s\d+(?!.+BBB)/g;.

But i don't know what is the execution of my code?

like image 940
mkHun Avatar asked Nov 21 '25 06:11

mkHun


2 Answers

Let's consider the first example:

AAA 1121 BBB
\_/\_______/^
 |     |    |
 |     |    +--- this (the empty string right there) satisfies (?!BBB)
 |     |
 |     +-------- matched by .+
 |     
 +-------------- matched by AAA

This is because the greedy .+ consumes 1121 BBB including the BBB. After it has consumed the rest of the line, (?!BBB) is checked against the remaining empty string. And this empty string satisfies (?!BBB) since it's not followed by BBB?


Negative lookahead

The algorithm executes like the following. ^ is the current position (there's a current position in the string and a current position in the pattern (kind of)).

  1. Initial state:

    AAA 1121 BBB          AAA.+(?!BBB)
    ^                     ^
    
  2. Match AAA

    AAA 1121 BBB          AAA.+(?!BBB)
       ^                     ^
    
  3. Match .+

    AAA 1121 BBB          AAA.+(?!BBB)
                ^              ^
    
  4. Check (?!BBB)

    AAA 1121 BBB          AAA.+(?!BBB)
                ^                     ^
    
  5. No BBB match at this position => Success!

    AAA 1121 BBB
    \__________/
    

Positive lookahead

Now, let's see why exactly AAA.+(?=BBB) yields a match:

  1. Initial state:

    AAA 1121 BBB          AAA.+(?=BBB)
    ^                     ^
    
  2. Match AAA

    AAA 1121 BBB          AAA.+(?=BBB)
       ^                     ^
    
  3. Match .+

    AAA 1121 BBB          AAA.+(?=BBB)
                ^              ^
    
  4. Check (?=BBB)

    AAA 1121 BBB          AAA.+(?=BBB)
                ^              ^
    

    No BBB match at this position => Backtrack (consume one less char by .+)

  5. Check (?=BBB)

    AAA 1121 BBB          AAA.+(?=BBB)
               ^               ^
    

    No BBB match at this position => Backtrack (consume one less char by .+)

  6. Check (?=BBB)

    AAA 1121 BBB          AAA.+(?=BBB)
              ^                ^
    

    No BBB match at this position => Backtrack (consume one less char by .+)

  7. Check (?=BBB)

    AAA 1121 BBB          AAA.+(?=BBB)
             ^                        ^
    
  8. We do have a BBB match at this position => Success!

    AAA 1121 BBB
    \_______/
    
like image 179
Lucas Trzesniewski Avatar answered Nov 23 '25 07:11

Lucas Trzesniewski


There is no difference in how it works in your two cases, and .+ is greedy in both of your cases.

When matching AAA.+(?=BBB) against AAA 1121 BBB, the most .+ can match starting after AAA is <spc>1121<spc>. Anything longer will cause (?=BBB) to fail.

When matching AAA.+(?!BBB) against AAA 1121 BBB, the most .+ can match starting after AAA is <spc>1121<spc>BBB. Being the rest of the string, it can't possibly match anything longer.

Note that the end of the string isn't followed by BBB, so (?!BBB) matches at the end of the string.


(?:(?!STRING).)* is to STRING as [^CHAR]* is to CHAR.

I'd go with

say $1 if /^(AAA\s+\S+)\s+(?:(?!BBB)\s)*\z/;

On second thought, I'd go with

my @F = split;
say "$F[0] $F[1]" if $F[0] eq 'AAA' && $F[2] ne 'BBB';
like image 43
ikegami Avatar answered Nov 23 '25 07:11

ikegami



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!