Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What Perl regex can match the contiguous subset of digits 12345?

Tags:

regex

perl

I would like a Perl regex that matches any contiguous subset of the string '12345'.

I'm probably just having a brain-freeze, but this is my test code and current best regex. I can see how to brute-force the situation by adding alternatives, but I'm wondering which elegant alternative I'm missing. [I don't specifically need captures for the digits; I have left the sample regex without non-capturing parentheses to make it slightly less cluttered.]

Test case:

use strict;
use warnings;

my @good = qw( 1 12 123 1234 12345 2 23 234 2345 3 34 345 4 45 5);
my @bad  = qw( 0 6 13 134 1345 145 15 124 1245 125 1235 24 245 25
               35 21 32 43 54 543 5432 54321);

my $qr = qr/^(1?(2?(3(4(5)?)?)?)?)$/;   # 3 'good', 3 'bad' failures
#my $qr = qr/^(1?(2(3(4(5)?)?)?)?)$/;   # 6 'good' failures.
my $fail = 0;

foreach my $opt (@good)
{
    printf "GOOD %d: $opt - missed by regex\n", ++$fail if ($opt !~ /$qr/);
}

foreach my $opt (@bad)
{
    printf "BAD %d: $opt - allowed by regex\n", ++$fail if ($opt =~ /$qr/);
}

print(($fail == 0) ? "PASS\n" : "FAIL\n");

Sample outputs:

Case 1 (commented out):

GOOD 1: 3 - missed by regex
GOOD 2: 34 - missed by regex
GOOD 3: 345 - missed by regex
GOOD 4: 4 - missed by regex
GOOD 5: 45 - missed by regex
GOOD 6: 5 - missed by regex
FAIL

Case 2 (active):

GOOD 1: 4 - missed by regex
GOOD 2: 45 - missed by regex
GOOD 3: 5 - missed by regex
BAD 4: 13 - allowed by regex
BAD 5: 134 - allowed by regex
BAD 6: 1345 - allowed by regex
FAIL

So, can you write a nice simple, symmetric regex that matches what I want and not what I don't?


This regex lets the test case pass, but isn't as elegant as I was hoping for:

my $qr = qr/^((1?(2(3(4(5)?)?)?)?)|(3?(4(5)?)?)|5)$/;

Test case with Justin's solution

use strict;
use warnings;

my @good = qw( 1 12 123 1234 12345 2 23 234 2345 3 34 345 4 45 5);
my @bad  = qw( 0 6 13 134 1345 145 15 124 1245 125 1235 24 245 25
               35 21 32 43 54 543 5432 54321 11 122 1233 1223 12234);

#my $qr = qr/^(1?(2?(3(4(5)?)?)?)?)$/;   # 3 'good', 3 'bad' failures
#my $qr = qr/^(1?(2(3(4(5)?)?)?)?)$/;    # 6 'good' failures.
#my $qr = qr/^((1?(2(3(4(5)?)?)?)?)|(3?(4(5)?)?)|5)$/;  # Passes

# Ysth's solution - passes
#my $qr = qr/^[12345](?:(?<=1)2|(?<=2)3|(?<=3)4|(?<=4)5)*$/;

my $fail = 0;

foreach my $opt (@good)
{
    printf "GOOD %d: $opt - missed by regex\n", ++$fail if ('12345' !~ /$opt/);
    #printf "GOOD %d: $opt - missed by regex\n", ++$fail if ($opt !~ /$qr/);
}

foreach my $opt (@bad)
{
    printf "BAD %d: $opt - allowed by regex\n", ++$fail if ('12345' =~ /$opt/);
    #printf "BAD %d: $opt - allowed by regex\n", ++$fail if ($opt =~ /$qr/);
}

print(($fail == 0) ? "PASS\n" : "FAIL\n");
like image 834
Jonathan Leffler Avatar asked Oct 24 '10 04:10

Jonathan Leffler


People also ask

What is \b in Perl regex?

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length.

What does !~ Mean in Perl?

!~ is the negation of the binding operator =~ , like != is the negation of the operator == . The expression $foo !~ /bar/ is equivalent, but more concise, and sometimes more expressive, than the expression !($foo =~ /bar/)

What does \s+ mean in Perl?

(\S+) | will match and capture any number (one or more) of non-space characters, followed by a space character (assuming the regular expression isn't modified with a /x flag). In both cases, these constructs appear to be one component of an alternation.

How do I match a pattern in Perl?

m operator in Perl is used to match a pattern within the given text. The string passed to m operator can be enclosed within any character which will be used as a delimiter to regular expressions.


3 Answers

Reverse the match:

'12345' =~ /$opt/
like image 107
Justin Avatar answered Nov 15 '22 06:11

Justin


Here's a revised version of Justin's idea:

index('12345', $opt) >= 0;

Or, if you need to exclude the empty string

index('12345', $opt) >= 0 and length $opt;

This way, you don't need to check $opt for regex metachars. I'm not sure which version woud be faster.

like image 31
cjm Avatar answered Nov 15 '22 06:11

cjm


/^[12345](?:(?<=1)2|(?<=2)3|(?<=3)4|(?<=4)5)*\z/

Sorry, got it wrong twice. This should do it. The explicit list of all possible matches is going to be faster, though.

like image 24
ysth Avatar answered Nov 15 '22 07:11

ysth