Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl pattern matching with pattern arithmetic

Tags:

regex

perl

I want to understand how can I do arithmetic on matched sub-patterns in perl regex. This is just a sample code and I want to understand how can I use \1 (already matched sub-pattern. in this case - 7) to match pattern+1 (8)

my $y = 77668;
if($y =~ /(\d)\1(\d)\2\1+1/)   #How to increment a previously
                               #matched sub-pattern and form a pattern?
{
    print $y;
}

EDIT

From the answers, I see that pattern arithmetic is not possible.

This is what I want to achieve.
I want to form a regex which will match this pattern:

N-3N-2N-1NNN+1N+2N+3    (N = 3,4,5,6
like image 507
cppcoder Avatar asked Dec 16 '22 17:12

cppcoder


2 Answers

Its possible via regex code blocks:

my $y = 77668;
if($y =~ /(\d)\1(\d)\2(??{$1+1})/ ) {
    print $y;
}

In this snippet (??{ CODE }) returns another regex that must match, so this regex looks like "8" ($1+1). As a result, whole regex will match only if 5th digit is greather and 1st by 1. But drawback with 1st digit is 9, this code block will return "10", so possible its wrong behavior, but you said nothing about what must be done in this case.

Now about N-3N-2N-1NNN+1N+2N+3 question, you can match it with this regex:

my $n = 5;
if( $y =~ /(??{ ($n-3).($n-2).($n-1).$n.($n+1).($n+2).($n+3) })/ ){

Or more "scalable" way:

my $n = 5;
if( $y =~ /(??{ $s=''; $s .= $n+$_ foreach(-3..3); $s; })/ ){

Again, what we must do if $n == 2 ?? $n-3 will be -1. Its not a simply digit cus it have sign, so you should think about this cases.

One another way. Match what we have and then check it.

if( $y =~ /(\d)(\d)(\d)(\d)(\d)(\d)(\d)/ ) {
    if( $1 == ($4-3) && $2 == ($4-2) && $3 == ($4-1) && $6 == ($4+1) && $7 == ($4+2) && $7 == ($4+3) ) {
        #...

Seems this method litle bit clumsy, but its obivious to everyone (i hope).

Also, you can optimize your regex since 7 ascending digits streak is not so frequent combination, plus get some lulz from co-workers xD:

sub check_number {
    my $i;
    for($i=1; $i<length($^N); $i++) {
        last if substr($^N, $i, 1)<=substr($^N, $i-1, 1);
    }
    return $i<length($^N) ? "(*FAIL)" : "(*ACCEPT)";
}

if( $y =~ /[0123][1234][2345][3456][4567][5678][6789](??{ check_number() })/ ) {

Or... Maybe most human-friendly method:

if( $y =~ /0123456|1234567|2345678|3456789/ ) {

Seems last variant is bingo xD Its good example about not searching regex when things are so simple)

like image 170
PSIAlt Avatar answered Jan 08 '23 07:01

PSIAlt


Of course this is possible. We are talking about Perl regexes after all. But it will be rather ugly:

say "55336"=~m{(\d)\1(\d)\2(\d)(?(?{$1+1==$3})|(*F))}?"match":"fail";

or pretty-printed:

say "55336" =~ m{  (\d)\1 (\d)\2 (\d)
                   (?  (?{$1+1==$3}) # true-branch: nothing
                                   |(*FAIL)
                   )
                }x
     ? "match" : "fail";

What does this do? We collect the digits in ordinary captures. At the end, we use an if-else pattern:

(? (CONDITION) TRUE | FALSE )

We can embed code into a regex with (?{ code }). The return value of this code can be used as a condition. The (*FAIL) (short: (*F)) verb causes the match to fail. Use (*PRUNE) if you only want a branch, not the whole pattern to fail.

Embedded code is also great for debugging. However, older perls cannot use regexes inside this regex code :-(

So we can match lots of stuff and test it for validity inside the pattern itself. However, it might be a better idea to do that outside of the pattern like:

 "string" =~ /regex/ and (conditions)

Now to your main pattern N-3N-2N-1NNN+1N+2N+3 (I hope I parsed it correctly):

my $super_regex = qr{
        # N -3 N-2 N-1 N N N+1 N+2 N+3
        (\d)-3\1-2\1-1\1\1(\d)(\d)(\d)
        (?(?{$1==$2-1 and $1==$3-2 and $1==$4-3})|(*F))
    }x;

say "4-34-24-144567" =~ $super_regex ? "match" : "fail";

Or did you mean

my $super_regex = qr{
        #N-3 N-2 N-1  N  N   N+1 N+2 N+3
        (\d)(\d)(\d) (\d)\4 (\d)(\d)(\d)
        (?  (?{$1==$4-3 and $2==$4-2 and $3==$4-1 and
               $5==$4+1 and $6==$4+2 and $7==$4+3})|(*F))
    }x;

say "123445678" =~ $super_regex ? "match" : "fail";

The scary thing is that these even works (with perl 5.12).

We could also generate parts of the pattern at match-time with the (??{ code }) construct — the return value of this code is used as a pattern:

my $super_regex = qr{(\d)(??{$1+1})(??{$1+2})}x;
say "234"=~$super_regex ? "match":"fail"

et cetera. However, I think readability suffers more this way.

If you need more than nine captures, you can use named captures with the

(?<named>pattern) ... \k<named>

constructs. The contents are also available in the %+ hash, see perlvar for that.

To dive further into the secrets of Perl regexes, I recommend reading perlre a few times.

like image 27
amon Avatar answered Jan 08 '23 07:01

amon