Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to transliterate Cyrillic to Latin text

Tags:

I have a method which turns any Latin text (e.g. English, French, German, Polish) into its slug form,

e.g. Alpha Bravo Charlie => alpha-bravo-charlie

But it can't work for Cyrillic text (e.g. Russian), so what I'm wanting to do is transliterate the Cyrillic text to Latin characters, then slugify that.

Does anyone have a way to do such transliteration? Whether by actual source or a library.

I'm coding in C#, so a .NET library will work. Alternatively, if you have non-C# code, I'm sure I could convert it.

like image 805
ckknight Avatar asked Dec 03 '09 18:12

ckknight


2 Answers

You can use .NET open source dll library UnidecodeSharpFork to transliterate Cyrillic and many more languages to Latin.

Example usage:

Assert.AreEqual("Rabota s kirillitsey", "Работа с кириллицей".Unidecode()); Assert.AreEqual("CZSczs", "ČŽŠčžš".Unidecode()); Assert.AreEqual("Hello, World!", "Hello, World!".Unidecode()); 

Testing Cyrillic:

/// <summary> /// According to http://en.wikipedia.org/wiki/Romanization_of_Russian BGN/PCGN. /// http://en.wikipedia.org/wiki/BGN/PCGN_romanization_of_Russian /// With converting "ё" to "yo". /// </summary> [TestMethod] public void RussianAlphabetTest() {     string russianAlphabetLowercase = "а б в г д е ё ж з и й к л м н о п р с т у ф х ц ч ш щ ъ ы ь э ю я";     string russianAlphabetUppercase = "А Б В Г Д Е Ё Ж З И Й К Л М Н О П Р С Т У Ф Х Ц Ч Ш Щ Ъ Ы Ь Э Ю Я";      string expectedLowercase = "a b v g d e yo zh z i y k l m n o p r s t u f kh ts ch sh shch \" y ' e yu ya";     string expectedUppercase = "A B V G D E Yo Zh Z I Y K L M N O P R S T U F Kh Ts Ch Sh Shch \" Y ' E Yu Ya";      Assert.AreEqual(expectedLowercase, russianAlphabetLowercase.Unidecode());     Assert.AreEqual(expectedUppercase, russianAlphabetUppercase.Unidecode()); } 

Simple, fast and powerful. And it's easy to extend/modify transliteration table if you want to.

like image 141
Dima Stefantsov Avatar answered Oct 05 '22 06:10

Dima Stefantsov


    public static string Translit(string str)     {         string[] lat_up = {"A", "B", "V", "G", "D", "E", "Yo", "Zh", "Z", "I", "Y", "K", "L", "M", "N", "O", "P", "R", "S", "T", "U", "F", "Kh", "Ts", "Ch", "Sh", "Shch", "\"", "Y", "'", "E", "Yu", "Ya"};         string[] lat_low = {"a", "b", "v", "g", "d", "e", "yo", "zh", "z", "i", "y", "k", "l", "m", "n", "o", "p", "r", "s", "t", "u", "f", "kh", "ts", "ch", "sh", "shch", "\"", "y", "'", "e", "yu", "ya"};         string[] rus_up = {"А", "Б", "В", "Г", "Д", "Е", "Ё", "Ж", "З", "И", "Й", "К", "Л", "М", "Н", "О", "П", "Р", "С", "Т", "У", "Ф", "Х", "Ц", "Ч", "Ш", "Щ", "Ъ", "Ы", "Ь", "Э", "Ю", "Я"};         string[] rus_low = { "а", "б", "в", "г", "д", "е", "ё", "ж", "з", "и", "й", "к", "л", "м", "н", "о", "п", "р", "с", "т", "у", "ф", "х", "ц", "ч", "ш", "щ", "ъ", "ы", "ь", "э", "ю", "я"};         for (int i = 0; i <= 32; i++)         {             str = str.Replace(rus_up[i],lat_up[i]);             str = str.Replace(rus_low[i],lat_low[i]);                       }         return str;     } 
like image 35
Romkar Avatar answered Oct 05 '22 06:10

Romkar