I am getting a string from a third party program that I don't control. My piece of the code outputs this in HTML. This works fine in English, but in other languages it will show in a funny way. For example, accents in Spanish look funny and characters in eastern languages (i.e. korean) will look very funny. I am pretty sure I need to do some encoding work so that all languages display correctly.
My understanding of encoding is kind of poor, so before posting the real question, which I intuitively think it is: "How do I encode this to UTF-8 in C#", I would like to get more understanding on the matter by posting simpler questions.
My question here is: How do I know which type of encoding does my input string has? In Spanish, it looks like this when I get an accent: "Acción", instead of "Acción". Is this ANSI or what am I dealing with?
Thanks a lot in advance!
I get an accent: "Acción"
The presence of the à character is a dead give-away. Accented capital A characters have character code 0xC0 and up. Which is often the first byte in a two-byte utf-8 encoded character. The ó glyph is codepoint U+00F3, the utf-8 encoding for it is 0xC3 + 0xB3. Which are the codepoints for à and ³
The strings are encoded in utf-8 but you are reading it with an 8-bit encoding like Encoding.Default
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With