I am wondering if there are any relationships or existing algorithms allowing converting from national characters to equivalent Latin characters within the UTF8 codepage?
For example (in Polish):
Ą -> A
Ó -> O
ż -> z
ź -> z ...
phrase like: 'zażółć gęślą jażń'
converts to: 'zazolc gesla jazn'
Currently I am using a conversion array for Polish, but I am looking for a universal solution handling all Latin based languages.
Thanks
Check this:
http://sourceforge.net/projects/iconvnet/
In general, search for something called iconv
To make the answer complete, the 'Unicode decomposition + C#' led me to this CodeProject article (codeproject.com/KB/cs/UnicodeNormalization.aspx?display=Print) which offers a ready to use solution. The ability to name what you are looking for can't be underestimated ;) Thanks for all answers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With