Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the end of line influence the regex here?

Tags:

regex

perl

The following seem the same to me but they give different result:

$ perl -e '  
my $pop = 298444215;  
$pop =~ s/(?<=\d)(?=(\d\d\d)+$)/,/g;   
print $pop,"\n"'  
298,444,215  

$ perl -e '  
my $pop = 298444215;  
$pop =~ s/(?<=\d)(?=(\d\d\d)+)/,/g;  
print $pop,"\n"'  
2,9,8,4,4,4,215  

What I was expecting as a result was the first (place a comma in the proper place of the number).
But why is the result so different by just adding/removing the $?

like image 300
Jim Avatar asked Feb 01 '14 20:02

Jim


2 Answers

The $ ensures that there are digits in triples up ahead the position where there is a match.

So that the matches will only be at those positions (spaces inserted for clarification):

        3     3
      v---v v---v
2 9 8 4 4 4 2 1 5
     ^     ^

The other positions do not match since there are no digits in sets of 3 till the end.

E.g. Here it doesn't match:

    3     3    2
  v---v v---v
2 9 8 4 4 4 2 1 5
 ^

Because there are 2 sets of 3 and then it cannot match the end of line, or another set of 3 digits.

But without the $, the lookahead matches at more positions:

2 9 8 4 4 4 2 1 5
 ^

Here, the lookbehind is satisfied, and so does the lookahead because there is at least one group of 3 digits ahead, being:

2 9 8 4 4 4 2 1 5
  ^---^

And the lookahead is satisfied here and doesn't need to match more than that.

This of course means that every other position that follows will also match, until the match is almost at the end:

2 9 8 4 4 4 2 1 5
             ^

Here, it cannot match since there are only 2 digits ahead.

like image 74
Jerry Avatar answered Oct 31 '22 21:10

Jerry


Your first example matches anything that has multiples of three digits as the last thing in the line of input whereas your second example matches anything that has multiples of three digits, but not necessarily all the way to the end.

To clarify, At the point between the 2 and the 98444215 in the string, there is a match for 984 442 following in your second example, but since your first example, the blocks of three digits must immediately be followed by the end of line, there is no match.

like image 24
elbie Avatar answered Oct 31 '22 21:10

elbie