What character encoding is c3 82 c2 bf?

Question

I have a source of text data that includes the byte sequence c3 82 c2 bf. In context I think it's supposed to be a capital Greek Phi symbol (Φ).

Anyway I can't figure out what encoding is being used; I'm writing a Python script to process this data into a database that expects Unicode, and it throws an exception on this particular sequence of data.

Any suggestions on how to handle it?

Kevin · Accepted Answer

FWIW, I ended up with c3 82 c2 bf from  . I did not dig into the transformations because I was able to simply throw that part of the code away. Suffice it to say that   was in an html email template that was processed by a wordpress (php) plugin.

Jukka K. Korpela · Answer

Interpreted as UTF-8, c3 82 is “Â” U+00C2 and c2 bf is “¿” U+00BF, which does not make much sense, but it’s technically valid UTF-8 data, so it should not be reported as character-level data error. Interpreted as UTF-16, it’s Hangul syllables and possibly a CJK ideograph, depending on endianness, but still formally valid data, though most probably not what was meant.

This sounds like the result of double conversion, but it’s difficult to make educated guesses. If it stands for Φ, then the UTF-16 form is 03 A6 or A6 03 and the UTF-8 form is CE A6, which don’t really resemble the actual data. Information about the origin of the data might help in guessing what transcodings may have happened.

Pablo Santa Cruz · Answer

It's probably a double conversion from Ñ character.

Ñ character in UTF-8 is: 0xc391.

If you try to convert from LATIN-1 to UTF-8 the Ñ character which is already encoded in UTF-8, you'll get: 0xc382c2bf.

Why?

0xc382 is UTF-8 translation from LATIN-1 0xc3 character Ã (A with tilde)
0xc2bf is ¿ character which is what you get when you can't convert a character from LATIN-1 (0x91 is an invalid character in LATIN-1

What character encoding is c3 82 c2 bf?

Tags:

encoding

unicode

Jason S

3 Answers

Kevin

Jukka K. Korpela

Pablo Santa Cruz

Recent Activity

Donate For Us

What character encoding is c3 82 c2 bf?

Tags:

encoding

unicode

Jason S

3 Answers

Kevin

Jukka K. Korpela

Pablo Santa Cruz

Related questions

Recent Activity

Donate For Us