Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String replace diacritics in C# [duplicate]

I'd like to use this method to create user-friendly URL. Because my site is in Croatian, there are characters that I wouldn't like to strip but replace them with another. For example, this string:

ŠĐĆŽ šđčćž

needs to be:

sdccz-sdccz

So, I would like to make two arrays, one that will contain characters that are to be replaced and other array with replacement characters:

string[] character = { "Š", "Đ", "Č", "Ć", "Ž", "š", "đ", "č", "ć", "ž" };
string[] characterReplace = { "s", "d", "c", "c", "z", "s", "d", "c", "c", "z" };

Finally, this two arrays should be use in some method that will take string, find matches and replace them. In php I used preg_replace function to deal with this. In C# this doesn't work:

s = Regex.Replace(s, character, characterReplace);

Would appreciate if someone could help.

like image 541
ilija veselica Avatar asked Apr 02 '10 13:04

ilija veselica


1 Answers

It seems you want to strip off diacritics and leave the base character. I'd recommend Ben Lings's solution here for this:

string input = "ŠĐĆŽ šđčćž";
string decomposed = input.Normalize(NormalizationForm.FormD);
char[] filtered = decomposed
    .Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
    .ToArray();
string newString = new String(filtered);

Edit: Slight problem! It doesn't work for the Đ. The result is:

SĐCZ sđccz
like image 103
Mark Byers Avatar answered Oct 03 '22 19:10

Mark Byers