Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count the capture groups in a qr regex?

I am working on a project which at one point gets a list of files from an ftp server. At that point it either returns an arrayref of files OR if an optional regex reference (i.e. qr), is passed it filters the list down using grep. Further if that qr has a capture group, it treats the captured section as a version number and returns instead a hashref where the keys are the versions and the values are the file names (which would have been returned as the array if no capture groups). The code looks like (simplified slightly)

sub filter_files {
  my ($files, $pattern) = @_;
  my @files = @$files;
  unless ($pattern) {
    return \@files;
  }

  @files = grep { $_ =~ $pattern } @files;
  carp "Could not find any matching files" unless @files;

  my %versions = 
    map { 
      if ($_ =~ $pattern and defined $1) { 
        ( $1 => $_ )
      } else {
        ()
      }
    } 
    @files;

  if (scalar keys %versions) {
    return \%versions;
  } else {
    return \@files;
  }
}

This implementation tries to create the hash and returns it if it succeeds. My question, is can I detect that the qr has a capture group and only attempt to create the hash if it does?

like image 786
Joel Berger Avatar asked Dec 28 '11 15:12

Joel Berger


People also ask

What is matching group in regex?

Regular expressions allow us to not just match text but also to extract information for further processing. This is done by defining groups of characters and capturing them using the special parentheses ( and ) metacharacters. Any subpattern inside a pair of parentheses will be captured as a group.

What does capture mean in regex?

capturing in regexps means indicating that you're interested not only in matching (which is finding strings of characters that match your regular expression), but you're also interested in using specific parts of the matched string later on.


3 Answers

You could use something like:

sub capturing_groups{
    my $re = shift;
    "" =~ /|$re/;
    return $#+;
}

say capturing_groups qr/fo(.)b(..)/;

Output:

2
like image 90
Qtax Avatar answered Sep 27 '22 20:09

Qtax


See nparen in Regexp::Parser.

use strictures;
use Carp qw(carp);
use Regexp::Parser qw();
my $parser = Regexp::Parser->new;

sub filter_files {
    my ($files, $pattern) = @_;
    my @files = @$files;
    return \@files unless $pattern;

    carp sprintf('Could not inspect regex "%s": %s (%d)',
        $pattern, $parser->errmsg, $parser->errnum)
        unless $parser->regex($pattern);

    my %versions;
    @files = map {
        if (my ($capture) = $_ =~ $pattern) {
            $parser->nparen
                ? push @{ $versions{$capture} }, $_
                : $_
        } else {
            ()
        }
    } @files;
    carp 'Could not find any matching files' unless @files;

    return (scalar keys %versions)
        ? \%versions
        : \@files;
}

Another possibility to avoid inspecting the pattern is to simply rely on the value of $capture. It will be 1 (Perl true value) in the case of a successful match without capture. You can distinguish it from the occasional capture returning 1 because that one lack the IV flag.

like image 33
daxim Avatar answered Sep 27 '22 20:09

daxim


You could use YAPE::Regex to parse the regular expression to see if there is a capture present:

use warnings;
use strict;
use YAPE::Regex;

filter_files(qr/foo.*/);
filter_files(qr/(foo).*/);

sub filter_files {
    my ($pattern) = @_;
    print "$pattern ";
    if (has_capture($pattern)) {
        print "yes capture\n";
    }
    else {
        print "no capture\n";
    }
}

sub has_capture {
    my ($pattern) = @_;
    my $cap = 0;
    my $p = YAPE::Regex->new($pattern);
    while ($p->next()) {
        if (scalar @{ $p->{CAPTURE} }) {
            $cap = 1;
            last;
        }
    }
    return $cap;
}

__END__

(?-xism:foo.*) no capture
(?-xism:(foo).*) yes capture
like image 21
toolic Avatar answered Sep 27 '22 20:09

toolic