Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accept international name characters in RegEx

Tags:

regex

php

I've always struggled with RegEx so forgive me if this may seem like an awful approach at tackling my problem.

When users are entering first and last names I started off just using the basic, check for upper and lower case, white space, apostrophes and hyphens

if (!preg_match("/^[a-zA-Z\s'-]+$/", $name)) { // Error }

Now I realise this isn't the best since people could have things such as: Dr. Martin Luther King, Jr. (with comma's and fullstops). So I assume by changing it to this would make it slightly more effective.

if (!preg_match("/^[a-zA-Z\s,.'-]+$/", $name)) { // Error }

I then saw a girls name I know on my Facebook who writes her name as Siân, which got me thinking of names which contain umlauts as well as say Japanese/Chinese/Korean/Russian characters too. So I started searching and found ways by writing each of these characters in there like so.

if (!preg_match("/^[a-zA-Z\sàáâäãåèéêëìíîïòóôöõøùúûüÿýñçčšžÀÁÂÄÃÅÈÉÊËÌÍÎÏÒÓÔÖÕØÙÚÛÜŸÝÑßÇŒÆČŠŽ∂ð ,.'-]+$/u", $first_name)) { // Error }

As you can imagine, it's extremely long winded and I'm pretty certain there is a much simpler RegEx which can achieve this. Like I've said, I've searched around but this is the best I can do.

So, what is a good way to check for upper and lower case characters, commas, full stops, apostrophes, hypens, umlauts, Latin, Japanese/Russian etc

like image 472
no. Avatar asked Nov 04 '11 18:11

no.


People also ask

How do you match a name in regex?

p{L} => matches any kind of letter character from any language. p{N} => matches any kind of numeric character. *- => matches asterisk and hyphen. + => Quantifier — Matches between one to unlimited times (greedy)

How do you match a character except one regex?

To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself. The character '. ' (period) is a metacharacter (it sometimes has a special meaning).

What does AZ mean in regex?

Character classes The character class is the most basic regex concept after a literal match. It makes one small sequence of characters match a larger set of characters. For example, [A-Z] could stand for any uppercase letter in the English alphabet, and \d could mean any digit.


1 Answers

You can use an Unicode character class. \pL covers pretty much all letter symbols.
http://php.net/manual/en/regexp.reference.unicode.php

 if (!preg_match("/^[a-zA-Z\s,.'-\pL]+$/u", $name))

See also http://www.regular-expressions.info/unicode.html, but beware that PHP/PCRE only understands the abbreviated class names.

like image 147
mario Avatar answered Oct 10 '22 22:10

mario