preg_match(???, 'firstname lastname') // true;
preg_match(???, '서프 누워') // true;
preg_match(???, '서프 lastname') // false;
preg_match(???, '#$@ #$$#') // false;
Currently I use:
'/^([一-龠0-9\s]+|[ぁ-ゔ0-9\s]+|[ก-๙0-9\s]+|[ァ-ヴー0-9\s]+|[a-zA-Z0-9\s]+|[々〆〤0-9\s]+)$/u'
But it only works on some languages.
$ means "Match the end of the string" (the position after the last character in the string).
Match any specific character in a setUse square brackets [] to match any characters in a set. Use \w to match any single alphanumeric character: 0-9 , a-z , A-Z , and _ (underscore). Use \d to match any single digit. Use \s to match any single whitespace character.
To get a string contains only letters (both uppercase or lowercase) we use a regular expression (/^[A-Za-z]+$/) which allows only letters.
A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation.
You need an expression that will match only characters from the same unicode script (and spaces), like:
^([\p{SomeScript} ]+|[\p{SomeOtherScript} ]+|...)$
You can build this expression dynamically from the list of scripts:
$scripts = "Hangul Hiragana Han Latin Cyrillic"; // feel free to add more
$re = [];
foreach(explode(' ', $scripts) as $s)
$re [] = sprintf('[\p{%s} ]+', $s);
$re = "~^(" . implode("|", $re) . ")$~u";
print preg_match($re, 'firstname lastname'); // 1
print preg_match($re, '서프 누워'); // 1
print preg_match($re, '서프 lastname'); // 0
print preg_match($re, '#$@ #$$#'); // 0
Do note however, that it's common for names (at least, in European scripts I'm familiar with) to include characters like dots, dashes and apostrophes, which belong to the "Common" script rather than to a language-specific one. To take these into account, a more realistic version of a "chunk" in the above expression could be like this:
((\p{SomeScript}+(\. ?|[ '-]))*\p{SomeScript}+)
which will at least correctly validate L. A. Léon de Saint-Just
.
In general, validating people's names is a complicated problem and cannot be solved with 100% accuracy. See this funny post and comments therein for details and examples.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With