Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl Regex - Get offset of all the matches instead of one

I want to search a file for a string and then get offsets for all the matches. The content of file is as below:

sometext
sometext
AAA
sometext
AAA
AAA
sometext

I am reading this whole file into a string $text and then doing a regex match for AAA as follows:

if($text =~ m/AAA/g) {
    $offset = $-[0];
}

This will give offset of only one AAA. How can I get offset of all the matches?

I know that we can get all matches in an array using syntax like this:

my @matches = ($text =~ m/AAA/g);

But I want offset not matched string.

Currently I am using following code to get offsets of all matches:

my $text= "sometextAAAsometextAAA";
my $regex = 'AAA';
my @matches = ();

while ($text =~ /($regex)/gi){
    my $match = $1;
    my $length = length($&);
    my $pos = length($`);
    my $start = $pos + 1;
    my $end = $pos + $length;
    my $hitpos = "$start-$end";
    push @matches, "$match found at $hitpos ";
}

print "$_\n" foreach @matches;

But is there a simpler way to to this?

like image 238
AnonGeek Avatar asked Jul 11 '12 19:07

AnonGeek


2 Answers

You already know that you should use $-[0]! Replace

while ($text =~ /($regex)/gi){
    my $match = $1;
    my $length = length($&);
    my $pos = length($`);
    my $start = $pos + 1;
    my $end = $pos + $length;
    my $hitpos = "$start-$end";
    push @matches, "$match found at $hitpos ";
}

with

while ($text =~ /($regex)/gi){
    push @matches, "$1 found at $-[0]";
}

That said, I'm a big fan of separating calculations from output formatting, so I would do

while ($text =~ /($regex)/gi){
    push @matches, [ $1, $-[0] ];
}

PS — Unless you've unrolled a while loop, if (/.../g) makes no sense. At best, the /g does nothing. At worse, you get incorrect results.

like image 185
ikegami Avatar answered Oct 12 '22 01:10

ikegami


I don't think there's a built-in way to do this in Perl. But from How can I find the location of a regex match in Perl?:

sub match_all_positions {
    my ($regex, $string) = @_;
    my @ret;
    while ($string =~ /$regex/g) {
        push @ret, [ $-[0], $+[0] ];
    }
    return @ret
}
like image 38
slackwing Avatar answered Oct 12 '22 01:10

slackwing