I wrote a regex that validates an input string. It must have a minimum length of 8 chars (composed by alphanumeric and punctuation chars) and it must have at least one digit and one alphabetic char. So I've come up with the regex:
^(?=.*[0-9])(?=.*[a-zA-Z])[a-zA-Z0-9-,._;:]{8,}$
Now I have to rewrite this regex in a language that doesn't support lookahead, how should I rewrite that regex?
Valid inputs are:
1foo,bar
foo,bar1
1fooobar
foooobar1
fooo11bar
1234x567
a1234567
Invalid inputs:
fooo,bar
1234-567
.1234567
Lookahead is used as an assertion in Python regular expressions to determine success or failure whether the pattern is ahead i.e to the right of the parser's current position. They don't match anything. Hence, they are called as zero-width assertions.
The good news is that you can use lookbehind anywhere in the regex, not only at the start.
A lookahead assertion has the form (?= test) and can appear anywhere in a regular expression. MATLAB® looks ahead of the current location in the text for the test condition. If MATLAB matches the test condition, it continues processing the rest of the expression to find a match.
Lookarounds are zero width assertions. They check for a regex (towards right or left of the current position - based on ahead or behind), succeeds or fails when a match is found (based on if it is positive or negative) and discards the matched portion.
There are two approaches. One is to compose a single expression which handles all possible alternatives:
^[a-zA-Z][0-9][a-zA-Z0-9-,._;:]{6,}$
|
^[a-zA-Z][a-zA-Z0-9-,._;:][0-9][a-zA-Z0-9-,._;:]{5,}$
|
^[a-zA-Z][a-zA-Z0-9-,._;:]{2}[0-9][a-zA-Z0-9-,._;:]{4,}$
etc. This is a combinatoric nightmare, but it would work.
A much simpler approach is to validate the same string twice using two expressions:
^[a-zA-Z0-9-,._;:]{8,}$ # check length and permitted characters
and
[a-zA-Z].*[0-9]|[0-9].*[a-zA-Z] # check required characters
EDIT: @briandfoy correctly points out that it will be more efficient to search for each required character separately:
[a-zA-Z] # check for required alpha
and
[0-9] # check for required digit
This question was original tagged as perl
, and that's how I answered it. For the oracle stuff, I have no idea how you'd do the same thing. However, I'd try to validate this stuff before it got that far.
I wouldn't do this in one regular expression. When you decide to change the rules, you'll have the same amount of work to craft the new regular expression. I wouldn't use lookarounds for this even if they were available since I wouldn't want to tolerate all the backtracking.
This looks like it's a lot of code, but the part that addresses your problem is just the subroutine. It has very simple patterns. When the password rules change, you add or delete patterns. It might be worth it to use study, but I didn't investigate that:
use v5.10;
use strict;
use Test::More;
my @valids = qw(
1foo,bar
foo,bar1
1fooobar
foooobar1
fooo11bar
);
my @invalids = qw(
fooo,bar
short
nodigitbutlong
12345678
,,,,,,,,
);
sub is_good_password {
my( $password ) = @_;
state $rules = [
qr/\A[A-Z0-9,._;:-]{8,}\z/i,
qr/[0-9]/,
qr/[A-Z]/i,
];
foreach my $rule ( @$rules ) {
return 0 unless $password =~ $rule;
}
return 1;
}
foreach my $valid ( @valids ) {
ok( is_good_password( $valid ), "Password $valid is valid" );
}
foreach my $invalid ( @invalids ) {
ok( ! is_good_password( $invalid ), "Password $invalid is invalid" );
}
done_testing();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With