How can I match against multiple regexes in Perl?

Question

I've seen this previous post, about matching against multiple regexes How can I match against multiple regexes in Perl?

I'm looking for the fastest way to match all the values contained in an array against a very big file (500 MB).

The patterns are read from the stdin and may contain special characters that must be used in the regex (anchors, character classes etc). The match must happen when all the patterns are contained in the current row.

Currently I'm using a nested for cycle but I'm not very satisfied with the speed....

Thanks for your suggestions.

Schwern · Accepted Answer

Try Regexp::Assemble as suggested in the post you linked to and compare that to an iterative approach like grep. Regexp::Assemble should produce the fastest solution since Perl can optimize the joined regexes rather than scanning the whole line for each one. Since you don't know your input beforehand, ymmv.

Which version of Perl you're using will affect performance. 5.10 introduced a lot of optimizations for exactly this purpose (see "tries"). One of the biggest use cases is spam scanners like SpamAssassin which build a big regex of all the patterns they scan for, just like Regexp::Assemble.

Finally, since your input is so large, it may be worthwhile to assemble the regex into a file and then run grep -P -f $regex_file $big_file. -P tells grep to use Perl compatible regular expressions. The file is used to avoid shell quoting or command size limits. grep may blow the doors off Perl.

In the end, you're going to have to do the benchmarking.

Dov Grobgeld · Answer

Did you try using grep?

while($line=<>) {
    if (scalar(grep($line=~/$_/,@regexps))==scalar(@regexps)) {
       # ... All matched
    }
}

How can I match against multiple regexes in Perl?

Tags:

regex

perl

user764169

2 Answers

Schwern

Dov Grobgeld

Recent Activity

Donate For Us

How can I match against multiple regexes in Perl?

Tags:

regex

perl

user764169

2 Answers

Schwern

Dov Grobgeld

Related questions

Recent Activity

Donate For Us