How to know string encoding in C#

Question

I am getting a string from a third party program that I don't control. My piece of the code outputs this in HTML. This works fine in English, but in other languages it will show in a funny way. For example, accents in Spanish look funny and characters in eastern languages (i.e. korean) will look very funny. I am pretty sure I need to do some encoding work so that all languages display correctly.

My understanding of encoding is kind of poor, so before posting the real question, which I intuitively think it is: "How do I encode this to UTF-8 in C#", I would like to get more understanding on the matter by posting simpler questions.

My question here is: How do I know which type of encoding does my input string has? In Spanish, it looks like this when I get an accent: "AcciÃ³n", instead of "Acción". Is this ANSI or what am I dealing with?

Thanks a lot in advance!

Hans Passant · Accepted Answer

I get an accent: "AcciÃ³n"

The presence of the Ã character is a dead give-away. Accented capital A characters have character code 0xC0 and up. Which is often the first byte in a two-byte utf-8 encoded character. The ó glyph is codepoint U+00F3, the utf-8 encoding for it is 0xC3 + 0xB3. Which are the codepoints for Ã and ³

The strings are encoded in utf-8 but you are reading it with an 8-bit encoding like Encoding.Default

How to know string encoding in C#

Tags:

character-encoding

c#-4.0

Gaara

1 Answers

Hans Passant

Recent Activity

Donate For Us

How to know string encoding in C#

Tags:

character-encoding

c#-4.0

Gaara

1 Answers

Hans Passant

Related questions

Recent Activity

Donate For Us