Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to match the international alphabet (English a-z, + non English) with a regular expression?

Tags:

regex

unicode

I want to allow only entered data from the English alphabet and from the alphabet from Germany.

Like öäü OR France like áê or Chinese like ...

How can I configure my regular expression so it accepts all alphabetical characters from the international alphabet?

like image 611
msfanboy Avatar asked Mar 06 '10 10:03

msfanboy


People also ask

How do you match letters in regex?

How do you match letters in regex? To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" .

What does the regular expression A to Z matches?

The regular expression [A-Z][a-z]* matches any sequence of letters that starts with an uppercase letter and is followed by zero or more lowercase letters.

How would you match any character that is not a digit in regular expressions?

(Range Expression): Accept ANY ONE of the character in the range, e.g., [0-9] matches any digit; [A-Za-z] matches any uppercase or lowercase letters. [^...]: NOT ONE of the character, e.g., [^0-9] matches any non-digit.

What does Z mean in regex?

The subexpression/metacharacter “\Z” matches the end of the entire string except allowable final line terminator.


1 Answers

Since you specifically ask for Unicode, \p{L} is the shortcut for a Unicode letter. Not all regex flavors support this syntax, though. .NET, Perl, Java and the JGSoft regex engine will, Python won't, for example.

So, for example \b\p{L}+\b will match an entire word of Unicode characters.

like image 184
Tim Pietzcker Avatar answered Oct 08 '22 17:10

Tim Pietzcker