Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl Regular Expression - What does gc modifier means?

Tags:

I have a regex which matches some text as:

$text =~ m/$regex/gcxs 

Now I want to know what 'gc' modifier means:

I have searched and found that gc means "Allow continued search after failed /g match".

This is not clear to me. What does continued search means?

As far as I have understood, it means that start matching at the beginning if the /g search fails. But doesn't /g modififier matches the whole string?

like image 454
AnonGeek Avatar asked Jul 09 '12 12:07

AnonGeek


People also ask

What is G in Perl?

The “g” stands for “global”, which tells Perl to replace all matches, and not just the first one. Options are typically indicated including the slash, like “/g”, even though you do not add an extra slash, and even though you could use any non-word character instead of slashes.

What is \W in Perl regex?

A \w matches a single alphanumeric character (an alphabetic character, or a decimal digit) or _ , not a whole word. Use \w+ to match a string of Perl-identifier characters (which isn't the same as matching an English word).

What does =~ do in Perl?

Look it up on a text on Perl. Use parentheses. The ( =~ ) operator takes two arguments: a string on the left and a regular expression pattern on the right. Instead of searching in the string contained in the default variable, $_ , the search is performed in the string specified.

What does \s+ mean in Perl?

(\S+) | will match and capture any number (one or more) of non-space characters, followed by a space character (assuming the regular expression isn't modified with a /x flag). In both cases, these constructs appear to be one component of an alternation. Breaking it down: ( .... ) : Group and capture.


2 Answers

The /g modifier is used to remember the "position in a string" so you can incrementally process a string. e.g.

my $txt = "abc3de"; while( $txt =~ /\G[a-z]/g ) {     print "$&"; } while( $txt =~ /\G./g ) {     print "$&"; } 

Because the position is reset on a failed match, the above will output

abcabc3de 

The /c flag does not reset the position on a failed match. So if we add /c to the first regex like so

my $txt = "abc3de"; while( $txt =~ /\G[a-z]/gc ) {     print "$&"; } while( $txt =~ /\G./g ) {     print "$&"; } 

We end up with

abc3de 

Sample code: http://ideone.com/cC9wb

like image 55
Sodved Avatar answered Sep 28 '22 10:09

Sodved


In the perldoc perlre http://perldoc.perl.org/perlre.html#Modifiers

Global matching, and keep the Current position after failed matching. Unlike i, m, s and x, these two flags affect the way the regex is used rather than the regex itself. See Using regular expressions in Perl in perlretut for further explanation of the g and c modifiers.

The specified ref leads to:

http://perldoc.perl.org/perlretut.html#Using-regular-expressions-in-Perl

This URI has a sub-section entitled, 'Global matching' which contains a small tutorial/working example, including:

A failed match or changing the target string resets the position. If you don't want the position reset after failure to match, add the //c , as in /regexp/gc . The current position in the string is associated with the string, not the regexp. This means that different strings have different positions and their respective positions can be set or read independently.

HTH Lee

like image 29
Lee Goddard Avatar answered Sep 28 '22 11:09

Lee Goddard