I have a problem when inserting a string to database due to some encoding issues.
String source is a external rss feed. In web browser it looks ok. Even in debugger the text appears to be ok. If I copy the strong to notedpad, the result is also ok.
But in notepad++ was possible to see that string is using combining characters. If changing to ansii, both combined appears. e.g.
á is displayed as a´
(In notepad++ is is like having two chars, on over the other. I even can select ... half of the char)
I googled a lot and tried very different approach to this problem. I really want to find a clever way of convert string with combining diacritics to simple utf8 database compatible ones.
Any help? Thank you so much!
This should work for you
output.Normalize(NormalizationForm.FormC)
This little test gave 3, 2, 3. The middle string is correctly combining A and it's diacritic into a single UTF-8 character
Console.WriteLine(Encoding.UTF8.GetByteCount(("A\u0302")));
Console.WriteLine(Encoding.UTF8.GetByteCount(("A\u0302").Normalize(NormalizationForm.FormC)));
Console.WriteLine(Encoding.UTF8.GetByteCount(("T\u0302").Normalize(NormalizationForm.FormC)));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With