Okay, I have read about regex all day now, and still don't understand it properly. What i'm trying to do is validate a name, but the functions i can find for this on the internet only use [a-zA-Z]
, leaving characters out that i need to accept to.
I basically need a regex that checks that the name is at least two words, and that it does not contain numbers or special characters like !"#¤%&/()=...
, however the words can contain characters like æ, é, Â and so on...
An example of an accepted name would be: "John Elkjærd" or "André Svenson"
An non-accepted name would be: "Hans", "H4nn3 Andersen" or "Martin Henriksen!"
If it matters i use the javascript .match()
function client side and want to use php's preg_replace()
only "in negative" server side. (removing non-matching characters).
Any help would be much appreciated.
Update:
Okay, thanks to Alix Axel's answer i have the important part down, the server side one.
But as the page from LightWing's answer suggests, i'm unable to find anything about unicode support for javascript, so i ended up with half a solution for the client side, just checking for at least two words and minimum 5 characters like this:
if(name.match(/\S+/g).length >= minWords && name.length >= 5) {
//valid
}
An alternative would be to specify all the unicode characters as suggested in shifty's answer, which i might end up doing something like, along with the solution above, but it is a bit unpractical though.
This will make your regular expressions work with all Unicode regex engines. In addition to the standard notation, \p{L}, Java, Perl, PCRE, the JGsoft engine, and XRegExp 3 allow you to use the shorthand \pL. The shorthand only works with single-letter Unicode properties.
Special Regex Characters: These characters have special meaning in regex (to be discussed below): . , + , * , ? , ^ , $ , ( , ) , [ , ] , { , } , | , \ . Escape Sequences (\char): To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ).
Answers. HI. [a-zA-Z. _^%$#!~@,-]+ as referance and add more special characters which you want to allow.
\u000d — Carriage return — \r. \u2028 — Line separator. \u2029 — Paragraph separator.
Try the following regular expression:
^(?:[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s?)+$
In PHP this translates to:
if (preg_match('~^(?:[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s?)+$~u', $name) > 0)
{
// valid
}
You should read it like this:
^ # start of subject
(?: # match this:
[ # match a:
\p{L} # Unicode letter, or
\p{Mn} # Unicode accents, or
\p{Pd} # Unicode hyphens, or
\' # single quote, or
\x{2019} # single quote (alternative)
]+ # one or more times
\s # any kind of space
[ #match a:
\p{L} # Unicode letter, or
\p{Mn} # Unicode accents, or
\p{Pd} # Unicode hyphens, or
\' # single quote, or
\x{2019} # single quote (alternative)
]+ # one or more times
\s? # any kind of space (0 or more times)
)+ # one or more times
$ # end of subject
I honestly don't know how to port this to Javascript, I'm not even sure Javascript supports Unicode properties but in PHP PCRE this seems to work flawlessly @ IDEOne.com:
$names = array
(
'Alix',
'André Svenson',
'H4nn3 Andersen',
'Hans',
'John Elkjærd',
'Kristoffer la Cour',
'Marco d\'Almeida',
'Martin Henriksen!',
);
foreach ($names as $name)
{
echo sprintf('%s is %s' . "\n", $name, (preg_match('~^(?:[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s?)+$~u', $name) > 0) ? 'valid' : 'invalid');
}
I'm sorry I can't help you regarding the Javascript part but probably someone here will.
Validates:
Invalidates:
To replace invalid characters, though I'm not sure why you need this, you just need to change it slightly:
$name = preg_replace('~[^\p{L}\p{Mn}\p{Pd}\'\x{2019}\s]~u', '$1', $name);
Examples:
Note that you always need to use the u modifier.
Regarding JavaScript it is more tricky, since JavaScript Regex syntax doesn't support unicode character properties. A pragmatic solution would be to match letters like this:
[a-zA-Z\xC0-\uFFFF]
This allows letters in all languages and excludes numbers and all the special (non-letter) characters commonly found on keyboards. It is imperfect because it also allows unicode special symbols which are not letters, e.g. emoticons, snowman and so on. However, since these symbols are typically not available on keyboards I don't think they will be entered by accident. So depending on your requirements it may be an acceptable solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With