This seems to match the rules I have defined, but I only starting learning regex tonight, so I am wondering if it is correct.
Rules:
Regex pattern:
/^[a-zA-Z0-9]+([a-zA-Z0-9](_|-| )[a-zA-Z0-9])*[a-zA-Z0-9]+$/
To validate a RegExp just run it against null (no need to know the data you want to test against upfront). If it returns explicit false ( === false ), it's broken. Otherwise it's valid though it need not match anything.
The specs in the question aren't very clear, so I'll just assume the string can contain only ASCII letters and digits, with hyphens, underscores and spaces as internal separators. The meat of the problem is insuring that the first and last character are not separators, and that there's never more than one separator in a row (that part seems clear, anyway). Here's the simplest way:
/^[A-Za-z0-9]+(?:[ _-][A-Za-z0-9]+)*$/
After matching one or more alphanumeric characters, if there's a separator it must be followed by one or more alphanumerics; repeat as needed.
Let's look at regexes from some of the other answers.
/^[[:alnum:]]+(?:[-_ ]?[[:alnum:]]+)*$/
This is effectively the same (assuming your regex flavor supports the POSIX character-class notation), but why make the separator optional? The only reason you'd be in that part of the regex in the first place is if there's a separator or some other, invalid character.
/^[a-zA-Z0-9]+([_\s\-]?[a-zA-Z0-9])*$/
On the other hand, this only works because the separator is optional. After the first separator, it can only match one alphanumeric at a time. To match more, it has to keep repeating the whole group: zero separators followed by one alphanumeric, over and over. If the second [a-zA-Z0-9]
were followed by a plus sign, it could find a match by a much more direct route.
/^[a-zA-Z0-9][a-zA-Z0-9_\s\-]*[a-zA-Z0-9](?<![_\s\-]{2,}.*)$/
This uses unbounded lookbehind, which is a very rare feature, but you can use a lookahead to the same effect:
/^(?!.*[_\s-]{2,})[a-zA-Z0-9][a-zA-Z0-9_\s\-]*[a-zA-Z0-9]$/
This performs essentially a separate search for two consecutive separators, and fails the match if it finds one. The main body then only needs to make sure all the characters are alphanumerics or separators, with the first and last being alphanumerics. Since those two are required, the name must be at least two characters long.
/^[a-zA-Z0-9]+([a-zA-Z0-9](_|-| )[a-zA-Z0-9])*[a-zA-Z0-9]+$/
This is your own regex, and it requires the string to start and end with two alphanumeric characters, and if there are two separators within the string, there have to be exactly two alphanumerics between them. So ab
, ab-cd
and ab-cd-ef
will match, but a
, a-b
and a-b-c
won't.
Also, as some of the commenters have pointed out, the (_|-| )
in your regex should be [-_ ]
. That part's not incorrect, but if you have a choice between an alternation and a character class, you should always go with the character class: they're more efficient as well as more readable.
Again, I'm not worried about whether "alphanumeric" is supposed to include non-ASCII characters, or the exact meaning of "space", just how to enforce a policy of non-contiguous internal separators with a regex.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With