Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Allow only some letters, ban special characters ($% etc.) except others (' -)

I need a Regex for PHP to do the following:

I want to allow [a-zα-ωá-źа-яա-ֆა-ჰא-ת] and chinese, japanese (more utf-8) letters; I want to ban [^٩٨٧٦٥٤٣٢١٠۰۱۲۳۴۵۶۷۸۹] (arabic numbers);

This is what i've done:

function isValidFirstName($first_name) {
    return preg_match("/^(?=[a-zα-ωá-źа-яա-ֆა-ჰא-ת]+([a-zα-ωá-źа-яա-ֆა-ჰא-ת' -]+)?\z)[a-zα-ωá-źа-яա-ֆა-ჰא-ת' -]+$/i", $first_name);
}

It looks like it works, but if I type letters of more than 1 language, it doesn't validate.

Examples: Авпа Вапапва á-ź John - doesn't validate. John Gger - validates, á-ź á-ź - validates.

I would like to this all of these.

Or if there's a way, to echo a message if user entered more lingual string.

like image 754
Hypn0tizeR Avatar asked May 08 '12 10:05

Hypn0tizeR


2 Answers

I can't reproduce the failure cases here (Авпа Вапапва á-ź John validates just fine), but you can simplify the regex a lot - you don't need that lookahead assertion:

preg_match('/^[a-zα-ωá-źа-яա-ֆა-ჰא-ת][a-zα-ωá-źа-яա-ֆა-ჰא-ת\' -]*$/i', $first_name)

As far as I can tell from the character ranges you've given, you don't need to exclude the digits because anything outside these character classes will already cause the regex to fail.

Another consideration: If your goal is to allow any letter from any language/script (plus some punctuation and space) you can (if you're using Unicode strings) further simplify this to:

preg_match('/^\pL[\pL\' -]*$/iu', $first_name)

But generally, I wouldn't try to validate a name by regular expressions (or any other means): Falsehoods programmers believe about names.

like image 159
Tim Pietzcker Avatar answered Nov 02 '22 19:11

Tim Pietzcker


You may filter out Arabic characters by checking followin way using RegEx:

if (preg_match('/(?:[\p{Hebrew}]+)/imu', $subject)) {
    # Successful match
} else {
    # Match attempt failed
}

RegEx explanation

<!--
(?i)(?:[\p{IsHebrew}]+)

Options: case insensitive; ^ and $ match at line breaks

Match the remainder of the regex with the options: case insensitive (i) «(?i)»
Match the regular expression below «(?:[\p{IsHebrew}]+)»
   A character in the Unicode block “Hebrew” (U+0590..U+05FF) «[\p{IsHebrew}]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
-->
like image 30
Cylian Avatar answered Nov 02 '22 20:11

Cylian