Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex for names

Tags:

regex

php

Just starting to explore the 'wonders' of regex. Being someone who learns from trial and error, I'm really struggling because my trials are throwing up a disproportionate amount of errors... My experiments are in PHP using ereg().

Anyway. I work with first and last names separately but for now using the same regex. So far I have:

^[A-Z][a-zA-Z]+$   

Any length string that starts with a capital and has only letters (capital or not) for the rest. But where I fall apart is dealing with the special situations that can pretty much occur anywhere.

  • Hyphenated Names (Worthington-Smythe)
  • Names with Apostophies (D'Angelo)
  • Names with Spaces (Van der Humpton) - capitals in the middle which may or may not be required is way beyond my interest at this stage.
  • Joint Names (Ben & Jerry)

Maybe there's some other way a name can be that I'm no thinking of, but I suspect if I can get my head around this, I can add to it. I'm pretty sure there will be instances where more than one of these situations comes up in one name.

So, I think the bottom line is to have my regex also accept a space, hyphens, ampersands and apostrophes - but not at the start or end of the name to be technically correct.

like image 258
Humpton Avatar asked Nov 08 '08 20:11

Humpton


People also ask

How do you write names in regex?

A valid username should start with an alphabet so, [A-Za-z]. All other characters can be alphabets, numbers or an underscore so, [A-Za-z0-9_].

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string).

What is difference [] and () in regex?

This answer is not useful. Show activity on this post. [] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9.

What is the regex for special characters?

Special Regex Characters: These characters have special meaning in regex (to be discussed below): . , + , * , ? , ^ , $ , ( , ) , [ , ] , { , } , | , \ . Escape Sequences (\char): To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ).


2 Answers

This regex is perfect for me.

^([ \u00c0-\u01ffa-zA-Z'\-])+$ 

It works fine in php environments using preg_match(), but doesn't work everywhere.

It matches Jérémie O'Co-nor so I think it matches all UTF-8 names.

like image 187
Daan Avatar answered Oct 05 '22 19:10

Daan


  • Hyphenated Names (Worthington-Smythe)

Add a - into the second character class. The easiest way to do that is to add it at the start so that it can't possibly be interpreted as a range modifier (as in a-z).

^[A-Z][-a-zA-Z]+$
  • Names with Apostophies (D'Angelo)

A naive way of doing this would be as above, giving:

^[A-Z][-'a-zA-Z]+$

Don't forget you may need to escape it inside the string! A 'better' way, given your example might be:

^[A-Z]'?[-a-zA-Z]+$

Which will allow a possible single apostrophe in the second position.

  • Names with Spaces (Van der Humpton) - capitals in the middle which may or may not be required is way beyond my interest at this stage.

Here I'd be tempted to just do our naive way again:

^[A-Z]'?[- a-zA-Z]+$

A potentially better way might be:

^[A-Z]'?[- a-zA-Z]( [a-zA-Z])*$

Which looks for extra words at the end. This probably isn't a good idea if you're trying to match names in a body of extra text, but then again, the original wouldn't have done that well either.

  • Joint Names (Ben & Jerry)

At this point you're not looking at single names anymore?

Anyway, as you can see, regexes have a habit of growing very quickly...

like image 40
Matthew Scharley Avatar answered Oct 05 '22 17:10

Matthew Scharley