Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to evaluate the number of times a Perl regular expression has matched?

Tags:

regex

perl

I've been poring over perldoc perlre as well as the Regular Expressions Cookbook and related questions on Stack Overflow and I can't seem to find what appears to be a very useful expression: how do I know the number of current match?

There are expressions for the last closed group match ($^N), contents of match 3 (\g{3} if I understood the docs correctly), $', $& and $`. But there doesn't seem to be a variable I can use that simply tells me what the number of the current match is.

Is it really missing? If so, is there any explained technical reason why it is a hard thing to implement, or am I just not reading the perldoc carefully enough?

Please note that I'm interested in a built-in variable, NOT workarounds like using (${$count++}).

For context, I'm trying to build a regular expression that would match only some instances of a match (e.g. match all occurrences of character "E" but do NOT match occurrences 3, 7 and 10 where 3, 7 and 10 are simply numbers in an array). I ran into this when trying to construct a more idiomatic answer to this SO question.

I want to avoid evaluating regexes as strings to actually insert 3, 7 and 10 into the regex itself.

like image 733
DVK Avatar asked Aug 11 '12 15:08

DVK


1 Answers

I'm completely ignoring the actually utility or wisdom of using this for the other question.

I thought @- or @+ might do what you want since they hold the offsets of the numbered matches, but it looks like the regex engine already knows what the last index will be:

use v5.14;

use Data::Printer;

$_ = 'abc123abc345abc765abc987abc123';

my @matches = m/
    ([0-9]+)
    (?{ 
        print 'Matched \$' . $#+ . " group with $^N\n";
        say p(@+);
    })
    .*?
    ([0-9]+)
    (?{ 
        print 'Matched \$' . $#+ . " group with $^N\n"; 
        say p(@+);
    })  
    /x;

say "Matches: @matches";

This gives strings that show the last index as 2 even though it hasn't matched $2 yet.

Matched \$2 group with 123
[
    [0] 6,
    [1] 6,
    [2] undef
]
Matched \$2 group with 345
[
    [0] 12,
    [1] 6,
    [2] 12
]
Matches: 123 345

Notice that the first time around, $+[2] is undef, so that one hasn't been filled in yet. You might be able to do something with that, but I think that's probably getting away from the spirit of your question. If you were really fancy, you could create a tied scalar that has the value of the last defined index in @+, I guess.

like image 52
brian d foy Avatar answered Oct 16 '22 03:10

brian d foy