Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to validate a user name with regex?

Tags:

regex

This seems to match the rules I have defined, but I only starting learning regex tonight, so I am wondering if it is correct.

Rules:

  • Usernames can consist of lowercase and capitals
  • Usernames can consist of alphanumeric characters
  • Usernames can consist of underscore and hyphens and spaces
  • Cannot be two underscores, two hypens or two spaces in a row
  • Cannot have a underscore, hypen or space at the start or end

Regex pattern:

/^[a-zA-Z0-9]+([a-zA-Z0-9](_|-| )[a-zA-Z0-9])*[a-zA-Z0-9]+$/ 
like image 438
Zim Avatar asked Aug 03 '09 12:08

Zim


People also ask

How do you validate a regex pattern?

To validate a RegExp just run it against null (no need to know the data you want to test against upfront). If it returns explicit false ( === false ), it's broken. Otherwise it's valid though it need not match anything.


1 Answers

The specs in the question aren't very clear, so I'll just assume the string can contain only ASCII letters and digits, with hyphens, underscores and spaces as internal separators. The meat of the problem is insuring that the first and last character are not separators, and that there's never more than one separator in a row (that part seems clear, anyway). Here's the simplest way:

/^[A-Za-z0-9]+(?:[ _-][A-Za-z0-9]+)*$/ 

After matching one or more alphanumeric characters, if there's a separator it must be followed by one or more alphanumerics; repeat as needed.

Let's look at regexes from some of the other answers.

/^[[:alnum:]]+(?:[-_ ]?[[:alnum:]]+)*$/ 

This is effectively the same (assuming your regex flavor supports the POSIX character-class notation), but why make the separator optional? The only reason you'd be in that part of the regex in the first place is if there's a separator or some other, invalid character.

/^[a-zA-Z0-9]+([_\s\-]?[a-zA-Z0-9])*$/ 

On the other hand, this only works because the separator is optional. After the first separator, it can only match one alphanumeric at a time. To match more, it has to keep repeating the whole group: zero separators followed by one alphanumeric, over and over. If the second [a-zA-Z0-9] were followed by a plus sign, it could find a match by a much more direct route.

/^[a-zA-Z0-9][a-zA-Z0-9_\s\-]*[a-zA-Z0-9](?<![_\s\-]{2,}.*)$/ 

This uses unbounded lookbehind, which is a very rare feature, but you can use a lookahead to the same effect:

/^(?!.*[_\s-]{2,})[a-zA-Z0-9][a-zA-Z0-9_\s\-]*[a-zA-Z0-9]$/ 

This performs essentially a separate search for two consecutive separators, and fails the match if it finds one. The main body then only needs to make sure all the characters are alphanumerics or separators, with the first and last being alphanumerics. Since those two are required, the name must be at least two characters long.

/^[a-zA-Z0-9]+([a-zA-Z0-9](_|-| )[a-zA-Z0-9])*[a-zA-Z0-9]+$/ 

This is your own regex, and it requires the string to start and end with two alphanumeric characters, and if there are two separators within the string, there have to be exactly two alphanumerics between them. So ab, ab-cd and ab-cd-ef will match, but a, a-b and a-b-c won't.

Also, as some of the commenters have pointed out, the (_|-| ) in your regex should be [-_ ]. That part's not incorrect, but if you have a choice between an alternation and a character class, you should always go with the character class: they're more efficient as well as more readable.

Again, I'm not worried about whether "alphanumeric" is supposed to include non-ASCII characters, or the exact meaning of "space", just how to enforce a policy of non-contiguous internal separators with a regex.

like image 162
Alan Moore Avatar answered Sep 17 '22 13:09

Alan Moore