Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to convert language specific characters to latin characters in UTF8

Tags:

unicode

c#-4.0

I am wondering if there are any relationships or existing algorithms allowing converting from national characters to equivalent Latin characters within the UTF8 codepage?

For example (in Polish):

Ą -> A

Ó -> O

ż -> z

ź -> z ...

phrase like: 'zażółć gęślą jażń'

converts to: 'zazolc gesla jazn'

Currently I am using a conversion array for Polish, but I am looking for a universal solution handling all Latin based languages.

Thanks

like image 736
tomekole Avatar asked Jun 14 '11 10:06

tomekole


2 Answers

Check this:

http://sourceforge.net/projects/iconvnet/

In general, search for something called iconv

like image 135
carlo.borreo Avatar answered Jan 02 '23 09:01

carlo.borreo


To make the answer complete, the 'Unicode decomposition + C#' led me to this CodeProject article (codeproject.com/KB/cs/UnicodeNormalization.aspx?display=Print) which offers a ready to use solution. The ability to name what you are looking for can't be underestimated ;) Thanks for all answers.

like image 38
tomekole Avatar answered Jan 02 '23 11:01

tomekole