Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I build Perl regular expressions dynamically?

I have a Perl script that traverses a directory hierarchy using File::Next::files. It will only return to the script files that end in ".avi", ".flv", ".mp3", ".mp4", and ".wmv." Also it will skip the following sub directories: ".svn" and any sub directory that ends in ".frames." This is specified in the file_filter and descend_filter subroutines below.

my $iter = File::Next::files(
        { file_filter => \&file_filter, descend_filter => \&descend_filter },
        $directory );

sub file_filter { 
    # Called from File::Next:files.
    # Only select video files that end with the following extensions.
    /.(avi|flv|mp3|mp4|wmv)$/
}

sub descend_filter { 
    # Called from File::Next:files.
    # Skip subfolders that either end in ".frames" or are named the following:
    $File::Next::dir !~ /.frames$|^.svn$/
}

What I want to do is place the allowed file extensions and disallowed sub directory names in a configuration file so they can be updated on the fly.

What I want to know is how do I code the subroutines to build regex constructs based on the parameters in the configuration file?

/.(avi|flv|mp3|mp4|wmv)$/

$File::Next::dir !~ /.frames$|^.svn$/
like image 662
Dr. Faust Avatar asked May 22 '09 15:05

Dr. Faust


People also ask

What is \b in Perl regex?

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length. There are three different positions that qualify as word boundaries: Before the first character in the string, if the first character is a word character.

What does \d mean in Perl?

The Special Character Classes in Perl are as follows: Digit \d[0-9]: The \d is used to match any digit character and its equivalent to [0-9]. In the regex /\d/ will match a single digit. The \d is standardized to “digit”.

Is Perl a regex?

Regular Expression (Regex or Regexp or RE) in Perl is a special text string for describing a search pattern within a given text. Regex in Perl is linked to the host language and is not the same as in PHP, Python, etc. Sometimes it is termed as “Perl 5 Compatible Regular Expressions“.

How do I match a pattern in Perl?

m operator in Perl is used to match a pattern within the given text. The string passed to m operator can be enclosed within any character which will be used as a delimiter to regular expressions.


2 Answers

Assuming that you've parsed the configuration file to get a list of extensions and ignored directories, you can build the regular expression as a string and then use the qr operator to compile it into a regular expression:

my @extensions = qw(avi flv mp3 mp4 wmv);  # parsed from file
my $pattern    = '\.(' . join('|', @wanted) . ')$';
my $regex      = qr/$pattern/;

if ($file =~ $regex) {
    # do something
}

The compilation isn't strictly necessary; you can use the string pattern directly:

if ($file =~ /$pattern/) {
    # do something
}

Directories are a little harder because you have two different situations: full names and suffixes. Your configuration file will have to use different keys to make it clear which is which. e.g. "dir_name" and "dir_suffix." For full names I'd just build a hash:

%ignore = ('.svn' => 1);

Suffixed directories can be done the same way as file extensions:

my $dir_pattern = '(?:' . join('|', map {quotemeta} @dir_suffix), ')$';
my $dir_regex   = qr/$dir_pattern/;

You could even build the patterns into anonymous subroutines to avoid referencing global variables:

my $file_filter    = sub { $_ =~ $regex };
my $descend_filter = sub {
    ! $ignore{$File::Next::dir} &&
    ! $File::Next::dir =~ $dir_regex;
};

my $iter = File::Next::files({
    file_filter    => $file_filter,
    descend_filter => $descend_filter,
}, $directory);
like image 99
Michael Carman Avatar answered Oct 13 '22 08:10

Michael Carman


Lets say that you use Config::General for you config-file and that it contains these lines:

<MyApp>
    extensions    avi flv mp3 mp4 wmv
    unwanted      frames svn
</MyApp>

You could then use it like so (see the Config::General for more):

my $conf = Config::General->new('/path/to/myapp.conf')->getall();
my $extension_string = $conf{'MyApp'}{'extensions'};

my @extensions = split m{ }, $extension_string;

# Some sanity checks maybe...

my $regex_builder = join '|', @extensions;

$regex_builder = '.(' . $regex_builder . ')$';

my $regex = qr/$regex_builder/;

if($file =~ m{$regex}) {
    # Do something.
}


my $uw_regex_builder = '.(' . join ('|', split (m{ }, $conf{'MyApp'}{'unwanted'})) . ')$';
my $unwanted_regex = qr/$uw_regex_builder/;

if(File::Next::dir !~ m{$unwanted_regex}) {
    # Do something. (Note that this does not enforce /^.svn$/. You
    # will need some kind of agreed syntax in your conf-file for that.
}

(This is completely untested.)

like image 3
Anon Avatar answered Oct 13 '22 09:10

Anon