Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace lookahead in regex?

Tags:

regex

oracle

I wrote a regex that validates an input string. It must have a minimum length of 8 chars (composed by alphanumeric and punctuation chars) and it must have at least one digit and one alphabetic char. So I've come up with the regex:

^(?=.*[0-9])(?=.*[a-zA-Z])[a-zA-Z0-9-,._;:]{8,}$

Now I have to rewrite this regex in a language that doesn't support lookahead, how should I rewrite that regex?

Valid inputs are:

1foo,bar
foo,bar1
1fooobar
foooobar1
fooo11bar
1234x567
a1234567

Invalid inputs:

fooo,bar
1234-567
.1234567
like image 644
alessmar Avatar asked Nov 29 '11 05:11

alessmar


People also ask

What is a look ahead in regex?

Lookahead is used as an assertion in Python regular expressions to determine success or failure whether the pattern is ahead i.e to the right of the parser's current position. They don't match anything. Hence, they are called as zero-width assertions.

Can I use regex Lookbehind?

The good news is that you can use lookbehind anywhere in the regex, not only at the start.

What is lookahead assertion in regex?

A lookahead assertion has the form (?= test) and can appear anywhere in a regular expression. MATLAB® looks ahead of the current location in the text for the test condition. If MATLAB matches the test condition, it continues processing the rest of the expression to find a match.

What is regex lookaround?

Lookarounds are zero width assertions. They check for a regex (towards right or left of the current position - based on ahead or behind), succeeds or fails when a match is found (based on if it is positive or negative) and discards the matched portion.


2 Answers

There are two approaches. One is to compose a single expression which handles all possible alternatives:

^[a-zA-Z][0-9][a-zA-Z0-9-,._;:]{6,}$
  |
^[a-zA-Z][a-zA-Z0-9-,._;:][0-9][a-zA-Z0-9-,._;:]{5,}$
  |
^[a-zA-Z][a-zA-Z0-9-,._;:]{2}[0-9][a-zA-Z0-9-,._;:]{4,}$

etc. This is a combinatoric nightmare, but it would work.

A much simpler approach is to validate the same string twice using two expressions:

^[a-zA-Z0-9-,._;:]{8,}$          # check length and permitted characters

and

[a-zA-Z].*[0-9]|[0-9].*[a-zA-Z]  # check required characters

EDIT: @briandfoy correctly points out that it will be more efficient to search for each required character separately:

[a-zA-Z]                         # check for required alpha

and

[0-9]                            # check for required digit
like image 65
MetaEd Avatar answered Oct 31 '22 10:10

MetaEd


This question was original tagged as perl, and that's how I answered it. For the oracle stuff, I have no idea how you'd do the same thing. However, I'd try to validate this stuff before it got that far.

I wouldn't do this in one regular expression. When you decide to change the rules, you'll have the same amount of work to craft the new regular expression. I wouldn't use lookarounds for this even if they were available since I wouldn't want to tolerate all the backtracking.

This looks like it's a lot of code, but the part that addresses your problem is just the subroutine. It has very simple patterns. When the password rules change, you add or delete patterns. It might be worth it to use study, but I didn't investigate that:

use v5.10;
use strict;

use Test::More;

my @valids = qw(
    1foo,bar
    foo,bar1
    1fooobar
    foooobar1
    fooo11bar
    );

my @invalids = qw( 
    fooo,bar
    short
    nodigitbutlong
    12345678
    ,,,,,,,,
    );

sub is_good_password {
    my( $password ) = @_;

    state $rules = [
        qr/\A[A-Z0-9,._;:-]{8,}\z/i,
        qr/[0-9]/,
        qr/[A-Z]/i,
        ];

    foreach my $rule ( @$rules ) {
        return 0 unless $password =~ $rule;
        }

    return 1;
    }       

foreach my $valid ( @valids ) {
    ok( is_good_password( $valid ), "Password $valid is valid" );
    }

foreach my $invalid ( @invalids ) {
    ok( ! is_good_password( $invalid ), "Password $invalid is invalid" );
    }

done_testing();
like image 34
brian d foy Avatar answered Oct 31 '22 10:10

brian d foy