I am working on a project which at one point gets a list of files from an ftp server. At that point it either returns an arrayref of files OR if an optional regex reference (i.e. qr
), is passed it filters the list down using grep. Further if that qr
has a capture group, it treats the captured section as a version number and returns instead a hashref where the keys are the versions and the values are the file names (which would have been returned as the array if no capture groups). The code looks like (simplified slightly)
sub filter_files {
my ($files, $pattern) = @_;
my @files = @$files;
unless ($pattern) {
return \@files;
}
@files = grep { $_ =~ $pattern } @files;
carp "Could not find any matching files" unless @files;
my %versions =
map {
if ($_ =~ $pattern and defined $1) {
( $1 => $_ )
} else {
()
}
}
@files;
if (scalar keys %versions) {
return \%versions;
} else {
return \@files;
}
}
This implementation tries to create the hash and returns it if it succeeds. My question, is can I detect that the qr
has a capture group and only attempt to create the hash if it does?
Regular expressions allow us to not just match text but also to extract information for further processing. This is done by defining groups of characters and capturing them using the special parentheses ( and ) metacharacters. Any subpattern inside a pair of parentheses will be captured as a group.
capturing in regexps means indicating that you're interested not only in matching (which is finding strings of characters that match your regular expression), but you're also interested in using specific parts of the matched string later on.
You could use something like:
sub capturing_groups{
my $re = shift;
"" =~ /|$re/;
return $#+;
}
say capturing_groups qr/fo(.)b(..)/;
Output:
2
See nparen
in Regexp::Parser
.
use strictures;
use Carp qw(carp);
use Regexp::Parser qw();
my $parser = Regexp::Parser->new;
sub filter_files {
my ($files, $pattern) = @_;
my @files = @$files;
return \@files unless $pattern;
carp sprintf('Could not inspect regex "%s": %s (%d)',
$pattern, $parser->errmsg, $parser->errnum)
unless $parser->regex($pattern);
my %versions;
@files = map {
if (my ($capture) = $_ =~ $pattern) {
$parser->nparen
? push @{ $versions{$capture} }, $_
: $_
} else {
()
}
} @files;
carp 'Could not find any matching files' unless @files;
return (scalar keys %versions)
? \%versions
: \@files;
}
Another possibility to avoid inspecting the pattern is to simply rely on the value of $capture
. It will be 1
(Perl true value) in the case of a successful match without capture. You can distinguish it from the occasional capture returning 1
because that one lack the IV
flag.
You could use YAPE::Regex to parse the regular expression to see if there is a capture present:
use warnings;
use strict;
use YAPE::Regex;
filter_files(qr/foo.*/);
filter_files(qr/(foo).*/);
sub filter_files {
my ($pattern) = @_;
print "$pattern ";
if (has_capture($pattern)) {
print "yes capture\n";
}
else {
print "no capture\n";
}
}
sub has_capture {
my ($pattern) = @_;
my $cap = 0;
my $p = YAPE::Regex->new($pattern);
while ($p->next()) {
if (scalar @{ $p->{CAPTURE} }) {
$cap = 1;
last;
}
}
return $cap;
}
__END__
(?-xism:foo.*) no capture
(?-xism:(foo).*) yes capture
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With