Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I match against multiple regexes in Perl?

Tags:

regex

perl

I would like to check whether some string match any of a given set of regexes. How can I do that?

like image 377
David B Avatar asked Sep 12 '10 09:09

David B


People also ask

How do I match a pattern in Perl?

m operator in Perl is used to match a pattern within the given text. The string passed to m operator can be enclosed within any character which will be used as a delimiter to regular expressions.

How do I match parentheses in Perl?

So, if you use /./, you'll match any single character (except newline); if you use /(.)/, you'll still match any single character, but now it will be kept in a regular expression memory. For each pair of parentheses in the pattern, you'll have one regular expression memory.


2 Answers

Use smart matching if you have perl version 5.10 or newer!

#! /usr/bin/env perl

use warnings;
use strict;

use feature 'switch';

my @patterns = (
  qr/foo/,
  qr/bar/,
  qr/baz/,
);

for (qw/ blurfl bar quux foo baz /) {
  no warnings 'experimental::smartmatch';
  print "$_: ";
  given ($_) {
    when (@patterns) {
      print "hit!\n";
    }
    default {
      print "miss.\n";
    }
  }
}

Although you don’t see an explicit ~~ operator, Perl's given/when does it behind the scenes:

Most of the power comes from the implicit smartmatching that can sometimes apply. Most of the time, when(EXPR) is treated as an implicit smartmatch of $_, that is, $_ ~~ EXPR. (See Smartmatch Operator in perlop for more information on smartmatching.)

“Smartmatch Operator” in perlop gives a table of many combinations you can use, and the above code corresponds to the case where $a is Any and $b is Array, which corresponds roughly to

grep $a ~~ $_, @$b

except the search short-circuits, i.e., returns quickly on a match rather than processing all elements. In the implicit loop then, we’re smart matching Any against Regex, which is

$a =~ /$b/

Output:

blurfl: miss.
bar: hit!
quux: miss.
foo: hit!
baz: hit!

Addendum

Since this answer was originally written, Perl’s designers have realized there were mistakes in the way smartmatching works, and so it is now considered an experimental feature. The case used above is not one of the controversial uses, nonetheless the code’s output would include given is experimental and when is experimental except that I added no warnings 'experimental::smartmatch';.

Any use of experimental features involves some risk, but I’d estimate it being low likelihood for this case. When using code similar to the above and upgrading to a newer version of Perl, this is a potential gotcha to be aware of.

like image 134
Greg Bacon Avatar answered Nov 15 '22 16:11

Greg Bacon


From perlfaq6's answer to How do I efficiently match many regular expressions at once?, in this case the latest development version that I just updated with a smart match example.


How do I efficiently match many regular expressions at once?

(contributed by brian d foy)

If you have Perl 5.10 or later, this is almost trivial. You just smart match against an array of regular expression objects:

my @patterns = ( qr/Fr.d/, qr/B.rn.y/, qr/W.lm./ );

if( $string ~~ @patterns ) {
    ...
    };

The smart match stops when it finds a match, so it doesn't have to try every expression.

Earlier than Perl 5.10, you have a bit of work to do. You want to avoid compiling a regular expression every time you want to match it. In this example, perl must recompile the regular expression for every iteration of the C loop since it has no way to know what C will be:

my @patterns = qw( foo bar baz );

LINE: while( <DATA> ) {
    foreach $pattern ( @patterns ) {
        if( /\b$pattern\b/i ) {
            print;
            next LINE;
            }
        }
    }

The C operator showed up in perl 5.005. It compiles a regular expression, but doesn't apply it. When you use the pre-compiled version of the regex, perl does less work. In this example, I inserted a C to turn each pattern into its pre-compiled form. The rest of the script is the same, but faster:

my @patterns = map { qr/\b$_\b/i } qw( foo bar baz );

LINE: while( <> ) {
    foreach $pattern ( @patterns ) {
        if( /$pattern/ )
            {
            print;
            next LINE;
            }
        }
    }

In some cases, you may be able to make several patterns into a single regular expression. Beware of situations that require backtracking though.

my $regex = join '|', qw( foo bar baz );

LINE: while( <> ) {
    print if /\b(?:$regex)\b/i;
    }

For more details on regular expression efficiency, see I by Jeffrey Freidl. He explains how regular expressions engine work and why some patterns are surprisingly inefficient. Once you understand how perl applies regular expressions, you can tune them for individual situations.

like image 35
brian d foy Avatar answered Nov 15 '22 15:11

brian d foy