Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regexp Java for password validation

Tags:

java

regex

Try this:

^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[@#$%^&+=])(?=\S+$).{8,}$

Explanation:

^                 # start-of-string
(?=.*[0-9])       # a digit must occur at least once
(?=.*[a-z])       # a lower case letter must occur at least once
(?=.*[A-Z])       # an upper case letter must occur at least once
(?=.*[@#$%^&+=])  # a special character must occur at least once
(?=\S+$)          # no whitespace allowed in the entire string
.{8,}             # anything, at least eight places though
$                 # end-of-string

It's easy to add, modify or remove individual rules, since every rule is an independent "module".

The (?=.*[xyz]) construct eats the entire string (.*) and backtracks to the first occurrence where [xyz] can match. It succeeds if [xyz] is found, it fails otherwise.

The alternative would be using a reluctant qualifier: (?=.*?[xyz]). For a password check, this will hardly make any difference, for much longer strings it could be the more efficient variant.

The most efficient variant (but hardest to read and maintain, therefore the most error-prone) would be (?=[^xyz]*[xyz]), of course. For a regex of this length and for this purpose, I would dis-recommend doing it that way, as it has no real benefits.


simple example using regex

public class passwordvalidation {
    public static void main(String[] args) {
      String passwd = "aaZZa44@"; 
      String pattern = "(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[@#$%^&+=])(?=\\S+$).{8,}";
      System.out.println(passwd.matches(pattern));
   }
}

Explanations:

  • (?=.*[0-9]) a digit must occur at least once
  • (?=.*[a-z]) a lower case letter must occur at least once
  • (?=.*[A-Z]) an upper case letter must occur at least once
  • (?=.*[@#$%^&+=]) a special character must occur at least once
  • (?=\\S+$) no whitespace allowed in the entire string
  • .{8,} at least 8 characters

All the previously given answers use the same (correct) technique to use a separate lookahead for each requirement. But they contain a couple of inefficiencies and a potentially massive bug, depending on the back end that will actually use the password.

I'll start with the regex from the accepted answer:

^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[@#$%^&+=])(?=\S+$).{8,}$

First of all, since Java supports \A and \z I prefer to use those to make sure the entire string is validated, independently of Pattern.MULTILINE. This doesn't affect performance, but avoids mistakes when regexes are recycled.

\A(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[@#$%^&+=])(?=\S+$).{8,}\z

Checking that the password does not contain whitespace and checking its minimum length can be done in a single pass by using the all at once by putting variable quantifier {8,} on the shorthand \S that limits the allowed characters:

\A(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[@#$%^&+=])\S{8,}\z

If the provided password does contain a space, all the checks will be done, only to have the final check fail on the space. This can be avoided by replacing all the dots with \S:

\A(?=\S*[0-9])(?=\S*[a-z])(?=\S*[A-Z])(?=\S*[@#$%^&+=])\S{8,}\z

The dot should only be used if you really want to allow any character. Otherwise, use a (negated) character class to limit your regex to only those characters that are really permitted. Though it makes little difference in this case, not using the dot when something else is more appropriate is a very good habit. I see far too many cases of catastrophic backtracking because the developer was too lazy to use something more appropriate than the dot.

Since there's a good chance the initial tests will find an appropriate character in the first half of the password, a lazy quantifier can be more efficient:

\A(?=\S*?[0-9])(?=\S*?[a-z])(?=\S*?[A-Z])(?=\S*?[@#$%^&+=])\S{8,}\z

But now for the really important issue: none of the answers mentions the fact that the original question seems to be written by somebody who thinks in ASCII. But in Java strings are Unicode. Are non-ASCII characters allowed in passwords? If they are, are only ASCII spaces disallowed, or should all Unicode whitespace be excluded.

By default \s matches only ASCII whitespace, so its inverse \S matches all Unicode characters (whitespace or not) and all non-whitespace ASCII characters. If Unicode characters are allowed but Unicode spaces are not, the UNICODE_CHARACTER_CLASS flag can be specified to make \S exclude Unicode whitespace. If Unicode characters are not allowed, then [\x21-\x7E] can be used instead of \S to match all ASCII characters that are not a space or a control character.

Which brings us to the next potential issue: do we want to allow control characters? The first step in writing a proper regex is to exactly specify what you want to match and what you don't. The only 100% technically correct answer is that the password specification in the question is ambiguous because it does not state whether certain ranges of characters like control characters or non-ASCII characters are permitted or not.


You should not use overly complex Regex (if you can avoid them) because they are

  • hard to read (at least for everyone but yourself)
  • hard to extend
  • hard to debug

Although there might be a small performance overhead in using many small regular expressions, the points above outweight it easily.

I would implement like this:

bool matchesPolicy(pwd) {
    if (pwd.length < 8) return false;
    if (not pwd =~ /[0-9]/) return false;
    if (not pwd =~ /[a-z]/) return false;
    if (not pwd =~ /[A-Z]/) return false;
    if (not pwd =~ /[%@$^]/) return false;
    if (pwd =~ /\s/) return false;
    return true;
}

Thanks for all answers, based on all them but extending sphecial characters:

@SuppressWarnings({"regexp", "RegExpUnexpectedAnchor", "RegExpRedundantEscape"})
String PASSWORD_SPECIAL_CHARS = "@#$%^`<>&+=\"!ºª·#~%&'¿¡€,:;*/+-.=_\\[\\]\\(\\)\\|\\_\\?\\\\";
int PASSWORD_MIN_SIZE = 8;
String PASSWORD_REGEXP = "^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[" + PASSWORD_SPECIAL_CHARS + "])(?=\\S+$).{"+PASSWORD_MIN_SIZE+",}$";

Unit tested:

enter image description here


Password Requirement :

  • Password should be at least eight (8) characters in length where the system can support it.
  • Passwords must include characters from at least two (2) of these groupings: alpha, numeric, and special characters.

    ^.*(?=.{8,})(?=.*\d)(?=.*[a-zA-Z])|(?=.{8,})(?=.*\d)(?=.*[!@#$%^&])|(?=.{8,})(?=.*[a-zA-Z])(?=.*[!@#$%^&]).*$
    

I tested it and it works


For anyone interested in minimum requirements for each type of character, I would suggest making the following extension over Tomalak's accepted answer:

^(?=(.*[0-9]){%d,})(?=(.*[a-z]){%d,})(?=(.*[A-Z]){%d,})(?=(.*[^0-9a-zA-Z]){%d,})(?=\S+$).{%d,}$

Notice that this is a formatting string and not the final regex pattern. Just substitute %d with the minimum required occurrences for: digits, lowercase, uppercase, non-digit/character, and entire password (respectively). Maximum occurrences are unlikely (unless you want a max of 0, effectively rejecting any such characters) but those could be easily added as well. Notice the extra grouping around each type so that the min/max constraints allow for non-consecutive matches. This worked wonders for a system where we could centrally configure how many of each type of character we required and then have the website as well as two different mobile platforms fetch that information in order to construct the regex pattern based on the above formatting string.