Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

perl regex put all matches into an array including full match

Tags:

regex

perl

I have the following perl code:

use Data::Dumper;
$key = 'foobar:foo:bar';
$pattern = '^[^:]+:([a-z]{3}):(.+)$';
my @matches = $key =~ /$pattern/i;
print Dumper(@matches);

Output:

$VAR1 = 'foo';
$VAR2 = 'bar';

Or alternatively I can print $1 for first capture group, $2 for second.

What I want to know is how do I get the full pattern match. For example in PHP if I used preg_match, I would get this:

Array
(
    [0] => foobar:foo:bar
    [1] => foo
    [2] => bar
)

Where the first element (or $0 or \0) is the full match. How do I get this in Perl?

like image 664
slinkhi Avatar asked Jan 08 '23 10:01

slinkhi


2 Answers

Begin and end your regular expression with parentheses and make the whole expression another capture group.

my @matches = $key =~ /($pattern)/i;

print Dumper( ["foobar:foo:bar"=~/$pattern/i] );
$VAR1 = [
      'foo',
      'bar'
    ];

print Dumper( ["foobar:foo:bar"=~/($pattern)/i] );
$VAR1 = [
      'foobar:foo:bar',
      'foo',
      'bar'
    ];
like image 61
mob Avatar answered Feb 02 '23 20:02

mob


You can use the $& or ${^MATCH} variables for this, although there is a performance penalty (quite significant for $&) in Perl versions before 5.20. From perldoc perlvar:

  • $MATCH
  • $&

The string matched by the last successful pattern match (not counting any matches hidden within a BLOCK or eval() enclosed by the current BLOCK).

[...]

  • ${^MATCH}

This is similar to $& ($MATCH) except that it does not incur the performance penalty associated with that variable.

[...]

In Perl v5.18 and earlier, it is only guaranteed to return a defined value when the pattern was compiled or executed with the /p modifier. In Perl v5.20, the /p modifier does nothing, so ${^MATCH} does the same thing as $MATCH.

This variable was added in Perl v5.10.0.

Performance issues

Again from perldoc perlvar:

Traditionally in Perl, any use of any of the three variables $`, $& or $' (or their use English equivalents) anywhere in the code, caused all subsequent successful pattern matches to make a copy of the matched string, in case the code might subsequently access one of those variables. This imposed a considerable performance penalty across the whole program, so generally the use of these variables has been discouraged.

[...]

In Perl 5.10.0 the /p match operator flag and the ${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} variables were introduced, that allowed you to suffer the penalties only on patterns marked with /p.

In Perl 5.18.0 onwards, perl started noting the presence of each of the three variables separately, and only copied that part of the string required; so in

$`; $&; "abcdefgh" =~ /d/

perl would only copy the "abcd" part of the string. That could make a big difference in something like

$str = 'x' x 1_000_000;
$&; # whoops
$str =~ /x/g # one char copied a million times, not a million chars

In Perl 5.20.0 a new copy-on-write system was enabled by default, which finally fixes all performance issues with these three variables, and makes them safe to use anywhere.

Example:

perl -wE 'say for "foo:bar" =~ /^(\w+):(\w+)$/p; say ${^MATCH}'

Output:

foo
bar
foo:bar
like image 23
ThisSuitIsBlackNot Avatar answered Feb 02 '23 18:02

ThisSuitIsBlackNot