I've always struggled with RegEx so forgive me if this may seem like an awful approach at tackling my problem.
When users are entering first and last names I started off just using the basic, check for upper and lower case, white space, apostrophes and hyphens
if (!preg_match("/^[a-zA-Z\s'-]+$/", $name)) { // Error }
Now I realise this isn't the best since people could have things such as: Dr. Martin Luther King, Jr. (with comma's and fullstops). So I assume by changing it to this would make it slightly more effective.
if (!preg_match("/^[a-zA-Z\s,.'-]+$/", $name)) { // Error }
I then saw a girls name I know on my Facebook who writes her name as Siân, which got me thinking of names which contain umlauts as well as say Japanese/Chinese/Korean/Russian characters too. So I started searching and found ways by writing each of these characters in there like so.
if (!preg_match("/^[a-zA-Z\sàáâäãåèéêëìíîïòóôöõøùúûüÿýñçčšžÀÁÂÄÃÅÈÉÊËÌÍÎÏÒÓÔÖÕØÙÚÛÜŸÝÑßÇŒÆČŠŽ∂ð ,.'-]+$/u", $first_name)) { // Error }
As you can imagine, it's extremely long winded and I'm pretty certain there is a much simpler RegEx which can achieve this. Like I've said, I've searched around but this is the best I can do.
So, what is a good way to check for upper and lower case characters, commas, full stops, apostrophes, hypens, umlauts, Latin, Japanese/Russian etc
p{L} => matches any kind of letter character from any language. p{N} => matches any kind of numeric character. *- => matches asterisk and hyphen. + => Quantifier — Matches between one to unlimited times (greedy)
To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself. The character '. ' (period) is a metacharacter (it sometimes has a special meaning).
Character classes The character class is the most basic regex concept after a literal match. It makes one small sequence of characters match a larger set of characters. For example, [A-Z] could stand for any uppercase letter in the English alphabet, and \d could mean any digit.
You can use an Unicode character class. \pL
covers pretty much all letter symbols.
http://php.net/manual/en/regexp.reference.unicode.php
if (!preg_match("/^[a-zA-Z\s,.'-\pL]+$/u", $name))
See also http://www.regular-expressions.info/unicode.html, but beware that PHP/PCRE only understands the abbreviated class names.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With