Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Chaining multiple positive lookaheads in JavaScript regex

I'm new to learning Regular Expressions, and I came across this answer which uses positive lookahead to validate passwords.

The regular expression is - (/^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,}$/) and the breakdown provided by the user is -

(/^
(?=.*\d)                //should contain at least one digit
(?=.*[a-z])             //should contain at least one lower case
(?=.*[A-Z])             //should contain at least one upper case
[a-zA-Z0-9]{8,}         //should contain at least 8 from the mentioned characters
$/)

However, I'm not very clear on chaining multiple lookaheads together. From what I have learned, a positive lookahead checks if the expression is followed by what is specified in the lookahead. As an example, this answer says -

The regex is(?= all) matches the letters is, but only if they are immediately followed by the letters all

So, my question is how do the individual lookaheads work? If I break it down -

  1. The first part is ^(?=.*\d). Does this indicate that at the starting of the string, look for zero or more occurrences of any character, followed by 1 digit (thereby checking the presence of 1 digit)?
  2. If the first part is correct, then with the second part (?=.*[a-z]), does it check that after checking for Step 1 at the start of the string, look for zero or more occurrences of any character, followed by a lowercase letter? Or are the two lookaheads completely unrelated to each other?
  3. Also, what is the use of the ( ) around every lookahead? Does it create a capturing group?

I have also looked at the Rexegg article on lookaheads, but it didn't help much.

Would appreciate any help.

like image 481
Ruchika Sharma Avatar asked Nov 05 '17 11:11

Ruchika Sharma


1 Answers

As mentionned in the comments, the key point here are not the lookaheads but backtracking: (?=.*\d) looks for a complete line (.*), then backtracks to find at least one number (\d).


This is repeated throughout the different lookaheads and could be optimized like so:
(/^
(?=\D*\d)                // should contain at least one digit
(?=[^a-z]*[a-z])         // should contain at least one lower case
(?=[^A-Z]*[A-Z])         // should contain at least one upper case
[a-zA-Z0-9]{8,}          // should contain at least 8 from the mentioned characters
$/)

Here, the principle of contrast applies.

like image 65
Jan Avatar answered Sep 29 '22 05:09

Jan