Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Explanation of Lookaheads in This Regular Expression

Tags:

regex

I understand regular expressions reasonably well, but I don't get to make use of them often enough to be an expert. I ran across a regular expression that I am using to validate password strength, but it contains some regex concepts I am unfamiliar with. The regex is:

^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{6,}$

and in plain English it means that the string must contain at least one lowercase character, one uppercase character, and one number, and the string must be at least six characters long. Can anyone break this down for me to explain how this pattern actually describes that rule? I see a start of string char ^ and an end of string char $, three groups with lookaheads, a match any character . and a repetition {6,}.

Thanks to any regex guru who can help me get my head around this.

like image 528
Rich Miller Avatar asked Aug 06 '09 20:08

Rich Miller


3 Answers

Under normal circumstances, a piece of a regular expression matches a piece of the input string, and "consumes" that piece of the string. The next piece of the expression matches the next piece of the string, and so on.

Lookahead assertions don't consume any of the string, so your three lookahead assertions:

  • (?=.*\d)
  • (?=.*[a-z])
  • (?=.*[A-Z])

each mean "This pattern (anything followed by a digit, a lowercase letter, an uppercase letter, respectively) must appear somewhere in the string", but they don't move the current match position forwards, so the remainder of the expression:

  • .{6,}

(which means "six or more characters") must still match the whole of the input string.

like image 163
RichieHindle Avatar answered Nov 15 '22 23:11

RichieHindle


The lookahead group doesn't consume the input. This way, the same characters are actually being matched by the different lookahead groups.

You can think of it this way: search for anything (.*) until you find a digit (\d). If you do, go back to the beginning of this group (the concept of lookahead). Now look for anything (.*) until you find a lower case letter. Repeat for upper case letter. Now, match any 6 or more characters.

like image 40
Sinan Taifour Avatar answered Nov 16 '22 00:11

Sinan Taifour


To break it down completely.

^ -- Match beginning of line
(?=.*\d) -- The following string contains a number
(?=.*[a-z]) -- The following string contains a lowercase letter
(?=.*[A-Z]) -- The following string contains an uppercase letter
.{6,} -- Match at least 6, as many as desired of any character
$ -- Match end of line
like image 23
Sean Vieira Avatar answered Nov 15 '22 23:11

Sean Vieira