Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When is \G useful application in a regex?

Tags:

I am not clear on the use/need of the \G operator.
I read in the perldoc:

You use the \G anchor to start the next match on the same string where the last match left off.

I don't really understand this statement. When we use \g we usually move to the character after the last match anyway.
As the example shows:

$_ = "1122a44";   my @pairs = m/(\d\d)/g;   # qw( 11 22 44 )   

Then it says:

If you use the \G anchor, you force the match after 22 to start with the a:

$_ = "1122a44"; my @pairs = m/\G(\d\d)/g; 

The regular expression cannot match there since it does not find a digit, so the next match fails and the match operator returns the pairs it already found

I don't understand this either. "If you use the \G anchor, you force the match after 22 to start with a." But without the \G the matching will be attempted at a anyway right? So what is the meaning of this sentence?
I see that in the example the only pairs printed are 11 and 22. So 44 is not tried.

The example also shows that using c option makes it index 44 after the while.

To be honest, from all these I can not understand what is the usefulness of this operator and when it should be applied.
Could someone please help me understand this, perhaps with a meaningful example?

Update
I think I did not understand this key sentence:

If you use the \G anchor, you force the match after 22 to start with the a . The regular expression cannot match there since it does not find a digit, so the next match fails and the match operator returns the pairs it already found.

This seems to mean that when the match fails, the regex does not proceed further attempts and is consistent with the examples in the answers

Also:

After the match fails at the letter a , perl resets pos() and the next match on the same string starts at the beginning.

like image 847
Jim Avatar asked Feb 23 '14 17:02

Jim


2 Answers

\G is an anchor; it indicates where the match is forced to start. When \G is present, it can't start matching at some arbitrary later point in the string; when \G is absent, it can.

It is most useful in parsing a string into discrete parts, where you don't want to skip past other stuff. For instance:

my $string = " a 1 # "; while () {     if ( $string =~ /\G\s+/gc ) {         print "whitespace\n";     }     elsif ( $string =~ /\G[0-9]+/gc ) {         print "integer\n";     }     elsif ( $string =~ /\G\w+/gc ) {         print "word\n";     }     else {         print "done\n";         last;     } } 

Output with \G's:

whitespace word whitespace integer whitespace done 

without:

whitespace whitespace whitespace whitespace done 

Note that I am demonstrating using scalar-context /g matching, but \G applies equally to list context /g matching and in fact the above code is trivially modifiable to use that:

my $string = " a 1 # "; my @matches = $string =~ /\G(?:(\s+)|([0-9]+)|(\w+))/g; while ( my ($whitespace, $integer, $word) = splice @matches, 0, 3 ) {     if ( defined $whitespace ) {         print "whitespace\n";     }     elsif ( defined $integer ) {         print "integer\n";     }     elsif ( defined $word ) {         print "word\n";     } } 
like image 71
ysth Avatar answered Nov 25 '22 10:11

ysth


But without the \G the matching will be attempted at a anyway right?

Without the \G, it won't be constrained to start matching there. It'll try, but it'll try starting later if required. You can think of every pattern as having an implied \G.*? at the front.

Add the \G, and the meaning becomes obvious.

$_ = "1122a44";   my @pairs = m/\G     (\d\d)/xg;   # qw( 11 22 )  my @pairs = m/\G .*? (\d\d)/xg;   # qw( 11 22 44 ) my @pairs = m/       (\d\d)/xg;   # qw( 11 22 44 ) 

To be honest, from all these I can not understand what is the usefulness of this operator and when it should be applied.

As you can see, you get different results by adding a \G, so the usefulness is getting the result you want.

like image 22
ikegami Avatar answered Nov 25 '22 10:11

ikegami