Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert Turkish chars to English chars in a string?

Tags:

c#

encoding

string strTurkish = "ÜST";

how to make value of strTurkish as "UST" ?

like image 455
ozgun Avatar asked Dec 01 '12 15:12

ozgun


4 Answers

You can use the following method for solving your problem. The other methods do not convert "Turkish Lowercase I (\u0131)" correctly.

public static string RemoveDiacritics(string text)
{
    Encoding srcEncoding = Encoding.UTF8;
    Encoding destEncoding = Encoding.GetEncoding(1252); // Latin alphabet

    text = destEncoding.GetString(Encoding.Convert(srcEncoding, destEncoding, srcEncoding.GetBytes(text)));

    string normalizedString = text.Normalize(NormalizationForm.FormD);
    StringBuilder result = new StringBuilder();

    for (int i = 0; i < normalizedString.Length; i++)
    {
        if (!CharUnicodeInfo.GetUnicodeCategory(normalizedString[i]).Equals(UnicodeCategory.NonSpacingMark))
        {
            result.Append(normalizedString[i]);
        }
    }

    return result.ToString();
}
like image 57
ogun Avatar answered Nov 11 '22 04:11

ogun


var text = "ÜST";
var unaccentedText  = String.Join("", text.Normalize(NormalizationForm.FormD)
        .Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark));
like image 42
L.B Avatar answered Nov 11 '22 02:11

L.B


I'm not an expert on this sort of thing, but I think you can use string.Normalize to do it, by decomposing the value and then effectively removing an non-ASCII characters:

using System;
using System.Linq;
using System.Text;

class Test
{
    static void Main()
    {
        string text = "\u00DCST";
        string normalized = text.Normalize(NormalizationForm.FormD);
        string asciiOnly = new string(normalized.Where(c => c < 128).ToArray());
        Console.WriteLine(asciiOnly);
    }    
}

It's entirely possible that this does horrible things in some cases though.

like image 7
Jon Skeet Avatar answered Nov 11 '22 02:11

Jon Skeet


public string TurkishCharacterToEnglish(string text)
{
    char[] turkishChars = {'ı', 'ğ', 'İ', 'Ğ', 'ç', 'Ç', 'ş', 'Ş', 'ö', 'Ö', 'ü', 'Ü'};
    char[] englishChars = {'i', 'g', 'I', 'G', 'c', 'C', 's', 'S', 'o', 'O', 'u', 'U'};
    
    // Match chars
    for (int i = 0; i < turkishChars.Length; i++)
        text = text.Replace(turkishChars[i], englishChars[i]);

    return text;
}
like image 6
Umut D. Avatar answered Nov 11 '22 03:11

Umut D.