Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I know which portion of a Perl regex is matched by a string?

Tags:

regex

perl

I want to search the lines of a file to see if any of them match one of a set of regexs.

something like this:

my @regs = (qr/a/, qr/b/, qr/c/);
foreach my $line (<ARGV>) {
   foreach my $reg (@regs) {
      if ($line =~ /$reg/) {
         printf("matched %s\n", $reg);
      }
   }
}

but this can be slow.

it seems like the regex compiler could help. Is there an optimization like this:

my $master_reg = join("|", @regs); # this is wrong syntax. what's the right way?
foreach my $line (<ARGV>) {
   $line =~ /$master_reg/;
   my $matched = special_function();
   printf("matched the %sth reg: %s\n", $matched, $regs[$matched]
}

}

where 'special_function' is the special sauce telling me which portion of the regex was matched.

like image 403
mmccoo Avatar asked Jul 15 '11 00:07

mmccoo


2 Answers

Use capturing parentheses. Basic idea looks like this:

my @matches = $foo =~ /(one)|(two)|(three)/;
defined $matches[0]
    and print "Matched 'one'\n";
defined $matches[1]
    and print "Matched 'two'\n";
defined $matches[2]
    and print "Matched 'three'\n";
like image 85
Nemo Avatar answered Sep 25 '22 14:09

Nemo


Add capturing groups:

"pear" =~ /(a)|(b)|(c)/;
if (defined $1) {
    print "Matched a\n";
} elsif (defined $2) {
    print "Matched b\n";
} elsif (defined $3) {
    print "Matched c\n";
} else {
    print "No match\n";
}

Obviously in this simple example you could have used /(a|b|c)/ just as well and just printed $1, but when 'a', 'b', and 'c' can be arbitrarily complex expressions this is a win.

If you're building up the regex programmatically you might find it painful to have to use the numbered variables, so instead of breaking strictness, look in the @- or @+ arrays instead, which contain offsets for each match position. $-[0] is always set as long as the pattern matched at all, but higher $-[$n] will only contain defined values if the nth capturing group matched.

like image 38
hobbs Avatar answered Sep 23 '22 14:09

hobbs