I need to convert unicode string to unicode characters.
for eg:Language Tamil
"கமலி"=>'க','ம','லி'
i'm able to strip unicode bytes but producing unicode characters is became problem.
byte[] stringBytes = Encoding.Unicode.GetBytes("கமலி");
char[] stringChars = Encoding.Unicode.GetChars(stringBytes);
foreach (var crt in stringChars)
{
Trace.WriteLine(crt);
}
it gives result as :
'க'=>0x0b95
'ம'=>0x0bae
'ல'=>0x0bb2
'ி'=>0x0bbf
so here the problem is how to strip character 'லி' as it as 'லி' without splitting like 'ல','ி'.
since it is natural in Indian language by representing consonant and vowel as single characters but parsing with c# make difficulty.
All i need to be split into 3 characters.
To iterate over graphemes you can use the methods of the StringInfo
class.
Each combination of base character + combining characters is called a 'text element' by the .NET documentation, and you can iterate over them using a TextElementEnumerator
:
var str = "கமலி";
var enumerator = System.Globalization.StringInfo.GetTextElementEnumerator(str);
while (enumerator.MoveNext())
{
Console.WriteLine(enumerator.Current);
}
Output:
க
ம
லி
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With