Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to match Cyrillic characters with a regular expression

How do I match French and Russian Cyrillic alphabet characters with a regular expression? I only want to do the alpha characters, no numbers or special characters. Right now I have

[A-Za-z]

like image 202
Greg Finzer Avatar asked Nov 11 '09 17:11

Greg Finzer


People also ask

How do I match a character in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

Are Cyrillic characters UTF 8?

Cyrillic can be represented on a Linux computer by four main methods: KOI8-R, ISO 8859-5, Windows 1251 Codepage, and ISO 10646-1 UTF-8 Unicode 3.0.

Does regex work for other languages?

Short answer: yes.

Does regex work with Unicode?

This will make your regular expressions work with all Unicode regex engines. In addition to the standard notation, \p{L}, Java, Perl, PCRE, the JGsoft engine, and XRegExp 3 allow you to use the shorthand \pL. The shorthand only works with single-letter Unicode properties.


2 Answers

If your regex flavor supports Unicode blocks ([\p{IsCyrillic}]), you can match Cyrillic characters with:

[\p{IsCyrillic}] or [\p{Cyrillic}] 

Otherwise try using:

[U+0400–U+04FF] 

For PHP use:

[\x{0400}-\x{04FF}] 

Explanation:

[\p{IsCyrillic}]  Match a character from the Unicode block “Cyrillic” (U+0400–U+04FF) «[\p{IsCyrillic}]» 

Note:

Unicode Characters list and Numeric HTML Entities of [U+0400–U+04FF] .

like image 50
Pedro Lobito Avatar answered Sep 22 '22 02:09

Pedro Lobito


It depends on your regex flavor. If it supports Unicode character classes (like .NET, for instance), \p{L} matches a letter character (in any character set).

like image 33
Tim Pietzcker Avatar answered Sep 23 '22 02:09

Tim Pietzcker