I have a String with some cyrillic words inside. Each starts with a capital letter.
var str = 'ХєлпМіПліз';
I have found this solution str.match(/[А-Я][а-я]+/g)
.
But it returns me ["Пл"]
insted of ["Хєлп", "Мі", "Пліз"]
. Seems like it doesn't recognize ukrainian letters('і', 'є'), only russian.
So, How do I have to change that regex to include ukrainian letters?
[А-Я]
is not Cyrillic alphabet, it's just Russian!
Cyrillic is a writing system. It used in alphabets for many languages. (Like Latin: charset for West European languages, East European &c.)
To have both Russian and Ukrainian you'd get [А-ЯҐЄІЇ]
.
To add Belarisian: [А-ЯҐЄІЇЎ]
And for all Cyrillic chars (including Balcanian languages and Old Cyrillic), you can get it through Unicode subset class, like: \p{IsCyrillic}
[А-ЩЬЮЯҐЄІЇ]
or [А-ЩЬЮЯҐЄІЇа-щьюяґєії]
seems to be full Ukrainian alphabet of 33 letters in each case.
Apostrophe is not a letter, but occasionally included in alphabet, because it has an impact to the next vowel. Apostrophe is a part of the words, not divider. It may be displayed in a few ways:
27 "'" APOSTROPHE 60 "`" GRAVE ACCENT 2019 "’" RIGHT SINGLE QUOTATION MARK 2bc "ʼ" MODIFIER LETTER APOSTROPHE
and maybe some more.
Yes, it's a bit complicated with apostrophe. There is no common standard for it.
Use \p{Lu}
for uppercase match, \p{Ll}
for lowercase, or \p{L}
to match any letter
update: That works only for Java, not for JavaScript. Don't forget to include "apostrof", "ji" to your regexp
Ukranian alphabet has four different words from the cyrillic alphabet, such as: [і, є, ї, ґ], also it can contain a single quote inside
"ґуля, з'їсти, істота, Європа".match(/[а-яієїґ\']+/ig)
i
by the and will match the upper case, like with "Європа"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With