Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why regular expression for cyrillic letters misses a letter? [duplicate]

I want to validate a text input field in a html page to accept only Cyrillic letters. I have written the validation code in JavaScript using regular expression like this:

var namevalue = document.getElementById("name")
var letters = /^[А-Яа-я]+$/;
if (namevalue.matches(letters)) {
  alert("Accepted");
}
else {
  alert("Enter only cyrillic letters");
}

This code works fine for all cyrillic letters except Ё ё

like image 260
Rey Rajesh Avatar asked Nov 04 '14 09:11

Rey Rajesh


People also ask

What is a Cyrillic letter?

The definition of a Cyrillic letter for this list is a character encoded in the Unicode standard that a has script property of 'Cyrillic' and the general category of 'Letter'. An overview of the distribution of Cyrillic letters in Unicode is given in Cyrillic script in Unicode. Letters with diacritics .

How do I match Cyrillic characters in regex?

If your regex flavor supports Unicode blocks ( [\p {IsCyrillic}] ), you can match Cyrillic characters with: [\p {IsCyrillic}] Match a character from the Unicode block "Cyrillic" (U+0400–U+04FF) « [\p {IsCyrillic}]» Unicode Characters list and Numeric HTML Entities of [U+0400–U+04FF] . This thread explains that stackoverflow.com/questions/7926514/…

How to remove duplicate words from a sentence using regex?

Form a regular expression to remove duplicate words from sentences. regex = "\\b(\\w+)(?:\\W+\\1\\b)+"; The details of the above regular expression can be understood as: “\\b”: A word boundary. Boundaries are needed for special cases. For example, in “My thesis is great”, “is” wont be matched twice. “\\w+” A word character: [a-zA-Z_0-9]

What characters can I match with regular expressions?

With Regular Expressions, you can match whole classes of characters. Here are some examples that match a class of characters for 1 character position: Match all uppercase letters for 1 character position: Match all lowercase letters for 1 character position:


1 Answers

The problem why ё is not working because it's out of range Aа-Яя. Aа-Яа is in a Basic Cyrillic alphabet [0430-044F], but ё isn't in that Basic Cyrillic alphabet. ё belongs to Cyrillic extensions [0400-045F]. Because, JavaScript regexs engine compares not by letters itself but by its charcodes, so ё just is out of range.

Since I presume you mean modern Russian language where despite ё is rare but still in wide use I may suggest this solution

var namevalue = document.getElementById("name")

// please note that I added to your pattern "еЁ".
// now this matches all Russian cyrillic letters both small and caps
// plus ё and Ё
var letters = /^[А-Яа-яёЁ]+$/; 

if (namevalue.matches(letters)) {
   alert("Accepted");
}
else {
   alert("Enter only cyrillic letters");
} 

Unfortunately the problem with A-Я and Ё buried deep in Unicode specification. There is no plain and simple solution. So for robust programming you need always be prepared for such cases.

like image 88
Mark Zucchini Avatar answered Oct 13 '22 18:10

Mark Zucchini