Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regexp for a name

I need to make sure people enter their first, middle and last names correctly for a form in Rails. So the first thought for a regular expression is:

\A[[:upper:]][[:alpha:]'-]+( [[:upper:]][[:alpha:]'-]*)*\z

That'll make sure every word in the name starts with an uppercase letter followed by a letter or hyphen or apostrophe.

My first question I guess doesn't have much to do with regular expressions, though I'm hoping there's a regular expression I can copy for this. Are letters, hyphens and apostrophes the only characters I should be checking in a name?

My second question is if it's important to make sure each name has at least 1 uppercase letter? So many people enter all lowercase names and I really want to avoid that, but is it sometimes legitimate?

Here's what I have so far that makes sure there's at least 1 uppercase letter somewhere in the name:

\A([[:alpha:]'-]+ )*[[:alpha:]'-]*[[:upper:]][[:alpha:]'-]*( [[:alpha:]'-]+)*\z

Isn't there a [:name:] bracket expression? :)

UPDATE: I added . and , to the characters allowed, surprised I didn't think of them originally. So many people must have to deal with this kind of regular expression! Nobody has any pre-made regular expressions for this sort of thing?

like image 609
at. Avatar asked Mar 26 '14 09:03

at.


People also ask

How do you write names in regex?

A valid username should start with an alphabet so, [A-Za-z]. All other characters can be alphabets, numbers or an underscore so, [A-Za-z0-9_].

What are regex patterns?

A regular expression is a pattern that the regular expression engine attempts to match in input text. A pattern consists of one or more character literals, operators, or constructs.


1 Answers

A good start would be to allow letters, marks, punctiation and whitespace. To allow for a given name like "María-Jose" and a last name like "van Rossum" (note the whitespace). So that boils down to something like:

[\p{Letter}\p{Mark}\p{Punctuation}\p{Separator}]+

If you want to restrict that a bit you could have a look at classes like \p{Lowercase_Letter}, \p{Uppercase_Letter}, \p{Titlecase_Letter}, but there may be scripts that don't have casing. \p{Space_Separator} and \p{Dash_Punctuation} can narrow it down to names that I know. But names I don't...I don't know...

But before you start constructing your regex for "validating" a name. Please read this excellent piece on names by W3C. It will shake even your concepts of first, middle and last names.

For example:

In some cultures you are given a name (Björk, Osama) and an indication of who your father (or mother) was (Guðmundsdóttir, bin Mohammed). So the "first name" could be "Björk" but:

Björk wouldn’t normally expect to be called Ms. Guðmundsdóttir. Telephone directories in Iceland are sorted by given name.

But in other cultures, the first name is not given, but a family name. In "Zhāng Mànyù", "Zhāng" is the family name. And how to address her, would depend how well you know her, but again "Ms. Zhāng" would be strange.

The list of examples goes on and ends in a some 30+ links to Wikipedia for more examples.

The article does end with suggestions for field design and some pointers on what characters to allow:

Don't forget to allow people to use punctuation such as hyphens, apostrophes, etc. in names. Don't require names to be entered all in upper case – this can be difficult on a mobile device. Allow the user to enter a name with spaces , eg. to support prefixes and suffixes such as de in French, von in German, and Jnr/Jr in American names, and also because some people consider a space-separated sequence of characters to be a single name, eg. Rose Marie.

like image 54
Chris Wesseling Avatar answered Oct 14 '22 07:10

Chris Wesseling