I have a Perl script that traverses a directory hierarchy using File::Next::files. It will only return to the script files that end in ".avi", ".flv", ".mp3", ".mp4", and ".wmv." Also it will skip the following sub directories: ".svn" and any sub directory that ends in ".frames." This is specified in the file_filter
and descend_filter
subroutines below.
my $iter = File::Next::files(
{ file_filter => \&file_filter, descend_filter => \&descend_filter },
$directory );
sub file_filter {
# Called from File::Next:files.
# Only select video files that end with the following extensions.
/.(avi|flv|mp3|mp4|wmv)$/
}
sub descend_filter {
# Called from File::Next:files.
# Skip subfolders that either end in ".frames" or are named the following:
$File::Next::dir !~ /.frames$|^.svn$/
}
What I want to do is place the allowed file extensions and disallowed sub directory names in a configuration file so they can be updated on the fly.
What I want to know is how do I code the subroutines to build regex constructs based on the parameters in the configuration file?
/.(avi|flv|mp3|mp4|wmv)$/
$File::Next::dir !~ /.frames$|^.svn$/
The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length. There are three different positions that qualify as word boundaries: Before the first character in the string, if the first character is a word character.
The Special Character Classes in Perl are as follows: Digit \d[0-9]: The \d is used to match any digit character and its equivalent to [0-9]. In the regex /\d/ will match a single digit. The \d is standardized to “digit”.
Regular Expression (Regex or Regexp or RE) in Perl is a special text string for describing a search pattern within a given text. Regex in Perl is linked to the host language and is not the same as in PHP, Python, etc. Sometimes it is termed as “Perl 5 Compatible Regular Expressions“.
m operator in Perl is used to match a pattern within the given text. The string passed to m operator can be enclosed within any character which will be used as a delimiter to regular expressions.
Assuming that you've parsed the configuration file to get a list of extensions and ignored directories, you can build the regular expression as a string and then use the qr
operator to compile it into a regular expression:
my @extensions = qw(avi flv mp3 mp4 wmv); # parsed from file
my $pattern = '\.(' . join('|', @wanted) . ')$';
my $regex = qr/$pattern/;
if ($file =~ $regex) {
# do something
}
The compilation isn't strictly necessary; you can use the string pattern directly:
if ($file =~ /$pattern/) {
# do something
}
Directories are a little harder because you have two different situations: full names and suffixes. Your configuration file will have to use different keys to make it clear which is which. e.g. "dir_name" and "dir_suffix." For full names I'd just build a hash:
%ignore = ('.svn' => 1);
Suffixed directories can be done the same way as file extensions:
my $dir_pattern = '(?:' . join('|', map {quotemeta} @dir_suffix), ')$';
my $dir_regex = qr/$dir_pattern/;
You could even build the patterns into anonymous subroutines to avoid referencing global variables:
my $file_filter = sub { $_ =~ $regex };
my $descend_filter = sub {
! $ignore{$File::Next::dir} &&
! $File::Next::dir =~ $dir_regex;
};
my $iter = File::Next::files({
file_filter => $file_filter,
descend_filter => $descend_filter,
}, $directory);
Lets say that you use Config::General for you config-file and that it contains these lines:
<MyApp>
extensions avi flv mp3 mp4 wmv
unwanted frames svn
</MyApp>
You could then use it like so (see the Config::General for more):
my $conf = Config::General->new('/path/to/myapp.conf')->getall();
my $extension_string = $conf{'MyApp'}{'extensions'};
my @extensions = split m{ }, $extension_string;
# Some sanity checks maybe...
my $regex_builder = join '|', @extensions;
$regex_builder = '.(' . $regex_builder . ')$';
my $regex = qr/$regex_builder/;
if($file =~ m{$regex}) {
# Do something.
}
my $uw_regex_builder = '.(' . join ('|', split (m{ }, $conf{'MyApp'}{'unwanted'})) . ')$';
my $unwanted_regex = qr/$uw_regex_builder/;
if(File::Next::dir !~ m{$unwanted_regex}) {
# Do something. (Note that this does not enforce /^.svn$/. You
# will need some kind of agreed syntax in your conf-file for that.
}
(This is completely untested.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With