I would like a Perl regex that matches any contiguous subset of the string '12345'.
I'm probably just having a brain-freeze, but this is my test code and current best regex. I can see how to brute-force the situation by adding alternatives, but I'm wondering which elegant alternative I'm missing. [I don't specifically need captures for the digits; I have left the sample regex without non-capturing parentheses to make it slightly less cluttered.]
use strict;
use warnings;
my @good = qw( 1 12 123 1234 12345 2 23 234 2345 3 34 345 4 45 5);
my @bad = qw( 0 6 13 134 1345 145 15 124 1245 125 1235 24 245 25
35 21 32 43 54 543 5432 54321);
my $qr = qr/^(1?(2?(3(4(5)?)?)?)?)$/; # 3 'good', 3 'bad' failures
#my $qr = qr/^(1?(2(3(4(5)?)?)?)?)$/; # 6 'good' failures.
my $fail = 0;
foreach my $opt (@good)
{
printf "GOOD %d: $opt - missed by regex\n", ++$fail if ($opt !~ /$qr/);
}
foreach my $opt (@bad)
{
printf "BAD %d: $opt - allowed by regex\n", ++$fail if ($opt =~ /$qr/);
}
print(($fail == 0) ? "PASS\n" : "FAIL\n");
Case 1 (commented out):
GOOD 1: 3 - missed by regex
GOOD 2: 34 - missed by regex
GOOD 3: 345 - missed by regex
GOOD 4: 4 - missed by regex
GOOD 5: 45 - missed by regex
GOOD 6: 5 - missed by regex
FAIL
Case 2 (active):
GOOD 1: 4 - missed by regex
GOOD 2: 45 - missed by regex
GOOD 3: 5 - missed by regex
BAD 4: 13 - allowed by regex
BAD 5: 134 - allowed by regex
BAD 6: 1345 - allowed by regex
FAIL
So, can you write a nice simple, symmetric regex that matches what I want and not what I don't?
This regex lets the test case pass, but isn't as elegant as I was hoping for:
my $qr = qr/^((1?(2(3(4(5)?)?)?)?)|(3?(4(5)?)?)|5)$/;
use strict;
use warnings;
my @good = qw( 1 12 123 1234 12345 2 23 234 2345 3 34 345 4 45 5);
my @bad = qw( 0 6 13 134 1345 145 15 124 1245 125 1235 24 245 25
35 21 32 43 54 543 5432 54321 11 122 1233 1223 12234);
#my $qr = qr/^(1?(2?(3(4(5)?)?)?)?)$/; # 3 'good', 3 'bad' failures
#my $qr = qr/^(1?(2(3(4(5)?)?)?)?)$/; # 6 'good' failures.
#my $qr = qr/^((1?(2(3(4(5)?)?)?)?)|(3?(4(5)?)?)|5)$/; # Passes
# Ysth's solution - passes
#my $qr = qr/^[12345](?:(?<=1)2|(?<=2)3|(?<=3)4|(?<=4)5)*$/;
my $fail = 0;
foreach my $opt (@good)
{
printf "GOOD %d: $opt - missed by regex\n", ++$fail if ('12345' !~ /$opt/);
#printf "GOOD %d: $opt - missed by regex\n", ++$fail if ($opt !~ /$qr/);
}
foreach my $opt (@bad)
{
printf "BAD %d: $opt - allowed by regex\n", ++$fail if ('12345' =~ /$opt/);
#printf "BAD %d: $opt - allowed by regex\n", ++$fail if ($opt =~ /$qr/);
}
print(($fail == 0) ? "PASS\n" : "FAIL\n");
The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length.
!~ is the negation of the binding operator =~ , like != is the negation of the operator == . The expression $foo !~ /bar/ is equivalent, but more concise, and sometimes more expressive, than the expression !($foo =~ /bar/)
(\S+) | will match and capture any number (one or more) of non-space characters, followed by a space character (assuming the regular expression isn't modified with a /x flag). In both cases, these constructs appear to be one component of an alternation.
m operator in Perl is used to match a pattern within the given text. The string passed to m operator can be enclosed within any character which will be used as a delimiter to regular expressions.
Reverse the match:
'12345' =~ /$opt/
Here's a revised version of Justin's idea:
index('12345', $opt) >= 0;
Or, if you need to exclude the empty string
index('12345', $opt) >= 0 and length $opt;
This way, you don't need to check $opt
for regex metachars. I'm not sure which version woud be faster.
/^[12345](?:(?<=1)2|(?<=2)3|(?<=3)4|(?<=4)5)*\z/
Sorry, got it wrong twice. This should do it. The explicit list of all possible matches is going to be faster, though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With