Just starting to explore the 'wonders' of regex. Being someone who learns from trial and error, I'm really struggling because my trials are throwing up a disproportionate amount of errors... My experiments are in PHP using ereg().
Anyway. I work with first and last names separately but for now using the same regex. So far I have:
^[A-Z][a-zA-Z]+$
Any length string that starts with a capital and has only letters (capital or not) for the rest. But where I fall apart is dealing with the special situations that can pretty much occur anywhere.
Maybe there's some other way a name can be that I'm no thinking of, but I suspect if I can get my head around this, I can add to it. I'm pretty sure there will be instances where more than one of these situations comes up in one name.
So, I think the bottom line is to have my regex also accept a space, hyphens, ampersands and apostrophes - but not at the start or end of the name to be technically correct.
A valid username should start with an alphabet so, [A-Za-z]. All other characters can be alphabets, numbers or an underscore so, [A-Za-z0-9_].
$ means "Match the end of the string" (the position after the last character in the string).
This answer is not useful. Show activity on this post. [] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9.
Special Regex Characters: These characters have special meaning in regex (to be discussed below): . , + , * , ? , ^ , $ , ( , ) , [ , ] , { , } , | , \ . Escape Sequences (\char): To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ).
This regex is perfect for me.
^([ \u00c0-\u01ffa-zA-Z'\-])+$
It works fine in php environments using preg_match(), but doesn't work everywhere.
It matches Jérémie O'Co-nor
so I think it matches all UTF-8 names.
Add a - into the second character class. The easiest way to do that is to add it at the start so that it can't possibly be interpreted as a range modifier (as in a-z
).
^[A-Z][-a-zA-Z]+$
A naive way of doing this would be as above, giving:
^[A-Z][-'a-zA-Z]+$
Don't forget you may need to escape it inside the string! A 'better' way, given your example might be:
^[A-Z]'?[-a-zA-Z]+$
Which will allow a possible single apostrophe in the second position.
Here I'd be tempted to just do our naive way again:
^[A-Z]'?[- a-zA-Z]+$
A potentially better way might be:
^[A-Z]'?[- a-zA-Z]( [a-zA-Z])*$
Which looks for extra words at the end. This probably isn't a good idea if you're trying to match names in a body of extra text, but then again, the original wouldn't have done that well either.
At this point you're not looking at single names anymore?
Anyway, as you can see, regexes have a habit of growing very quickly...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With